[Python] Get the character code of the file

  1. Module to get the character code when reading a file

-Since it is troublesome to check and set the character code every time the file is read, I created a module to acquire it automatically. -It is especially useful when importing csv files containing Japanese created in Excel. -It also supports importing files on the net. -By setting the return value to encoding at the time of opening, it works without problems so far.

def check_encoding(file_path):

'''Get the character code of the file''' from chardet.universaldetector import UniversalDetector import requests

    detector = UniversalDetector()

    if file_path[:4] == 'http':
        r = requests.get(file_path)
        for binary in r:
            detector.feed(binary)
            if detector.done:
                break
        detector.close()

    else:
        with open(file_path, mode='rb') as f:
            for binary in f:
                detector.feed(binary)
                if detector.done:
                    break
        detector.close()

    print("  ", detector.result, end=' => ')
    print(detector.result['encoding'], end='\n')

    return detector.result['encoding']

-It seems that csv including Japanese has many Shift_JIS, so it seems better to convert it to more general-purpose cp932 in the next model. -By entering the return value obtained in the first model as an argument, the optimum character code name can be obtained as the return value.

def change_encoding(encoding):

'''Convert encoding sjis relation to cp932''' if encoding in ['Shift_JIS', 'SHIFT_JIS', 'shift_jis', 'sjis', 's_jis']: encoding = 'cp932'

    return encoding

Supervised, thank you.

Recommended Posts

[Python] Get the character code of the file
Convert the character code of the file with Python3
[Python] [chardet] Automatic detection of character code of file
Get the update date of the Python memo file.
[Python] Get the official file path of the shortcut file (.lnk)
Get the return code of the Python script from bat
python character code
Check the existence of the file with python
[Python3] Rewrite the code object of the function
Read the file by specifying the character code.
Get the EDINET code list in Python
[PowerShell] Get the reading of the character string
[Python3] Understand the basics of file operations
the zen of Python
About Python3 character code
Get the contents of git diff from python
[Python] Read the source code of Bottle Part 2
[Python] Get / edit the scale label of the figure
[Python] Get the main topics of Yahoo News
Get the caller of a function in Python
[Python] Read the source code of Bottle Part 1
[Python] Get the last updated date of the website
Links and memos of Python character code strings
Code for checking the operation of Python Matplotlib
[Python] Get the day of the week (English & Japanese)
I tried to get the authentication code of Qiita API with Python.
Get country code with python
Get the variable name of the variable as a character string.
Template of python script to read the contents of the file
Towards the retirement of Python2
Summary of python file operations
How to get the number of digits in Python
Download the file in Python
Get the size of the image file on the web (Python3, no additional library required)
[Python] Get the text of the law from the e-GOV Law API
[python] Get the list of classes defined in the module
About the ease of Python
Let's break down the basics of TensorFlow Python code
Get the number of digits
Explain the code of Tensorflow_in_ROS
Get the size (number of elements) of UnionFind in Python
[Python] Get the list of ExifTags names of Pillow library
Get the operation status of JR West with Python
[Python] Get the number of views of all posted articles
Get the URL of the HTTP redirect destination in Python
About the features of Python
Character code learned in Python
The Power of Pandas: Python
Try to get the function list of Python> os package
Get the MIME type in Python and determine the file format
Get the number of specific elements in a python list
[Note] Import of a file in the parent directory in Python
The process of making Python code object-oriented and improving it
Mass generation of QR code with character display by Python
System trade starting with Python3: Get the latest program code
Google search for the last line of the file in Python
Get the index of each element of the confusion matrix in Python
Get the source of the page to load infinitely with python.
Extract the xz file with python
The story of Python and the story of NaN
Easy encryption of file contents (Python)