-Since it is troublesome to check and set the character code every time the file is read, I created a module to acquire it automatically. -It is especially useful when importing csv files containing Japanese created in Excel. -It also supports importing files on the net. -By setting the return value to encoding at the time of opening, it works without problems so far.
'''Get the character code of the file''' from chardet.universaldetector import UniversalDetector import requests
detector = UniversalDetector() if file_path[:4] == 'http': r = requests.get(file_path) for binary in r: detector.feed(binary) if detector.done: break detector.close() else: with open(file_path, mode='rb') as f: for binary in f: detector.feed(binary) if detector.done: break detector.close() print(" ", detector.result, end=' => ') print(detector.result['encoding'], end='\n') return detector.result['encoding']
-It seems that csv including Japanese has many Shift_JIS, so it seems better to convert it to more general-purpose cp932 in the next model. -By entering the return value obtained in the first model as an argument, the optimum character code name can be obtained as the return value.
'''Convert encoding sjis relation to cp932''' if encoding in ['Shift_JIS', 'SHIFT_JIS', 'shift_jis', 'sjis', 's_jis']: encoding = 'cp932'
Supervised, thank you.