Encoding judgment in Python

Encoding judgment in Python

Classic library

Overview of chardet

Read the byte string and infer the encoding used from that pattern.

There are two basic ways to use chardet

  1. Have the `` `detect``` function read the bytes
  2. If the size of the byte string to be read is too large, create a UniversalDetector object and load it little by little with the `` `feed``` method.

Method 1

import chardet
from urllib.request import urlopen

with urlopen('http://qiita.com/') as response:
    html = response.read()
    print(chardet.detect(html))  // {'confidence': 0.99, 'encoding': 'utf-8'}

Method 2

Universal Detector main interfaces:

from chardet.universaldetector import UniversalDetector
from urllib.request import urlopen

detector = UniversalDetector()

with urlopen('http://qiita.com/') as response:
    for l in response:
        detector.feed(l)
        if detector.done:
            break
detector.close()
print(detector.result)  // {'confidence': 0.99, 'encoding': 'utf-8'}

It's easy to do, `detector.feed``` to read ``` detector``` line by line, and `detecor.done``` to check if the judgment is complete. And finally, the flow of displaying the result.

To study further

  • [How it works] in the chardet documentation (https://chardet.readthedocs.org/en/latest/how-it-works.html)
  • First half of [Chapter 15] of Dive Into Python 3 (http://diveintopython3-ja.rdy.jp/case-study-porting-chardet-to-python-3.html)

Recommended Posts

Encoding judgment in Python
Base64 encoding images in Python 3
Python encoding
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
SendKeys in Python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
To set default encoding to utf-8 in python
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python
Daily AtCoder # 6 in Python
Daily AtCoder # 18 in Python
Edit fonts in Python
Singleton pattern in Python
File operations in Python
Read DXF in python
Daily AtCoder # 53 in Python
Key input in Python
Use config.ini in Python
Daily AtCoder # 33 in Python
Solve ABC168D in Python
Logistic distribution in Python