When reading a CSV file with pandas, it is very convenient because you only need to read_csv
.
import pandas as pd
pd.read_csv("file/to/path")
Normally, there is no problem with the above, but if there are bad characters in the CSV, the following error will be thrown.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte
It seems that he is angry, saying "I can't decode it."
Since the character code of CSV created in Excel is "shift-jis", I will try to specify with ʻencoding` of reading for the time being,
import pandas as import pd
pd.read_csv("file/to/path", encoding="shift-jis")
After all it is an error. That's right.
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x87 in position 0: illegal multibyte sequence
As a solution, it seems that you can read it by specifying ʻignore in
codecs.open, ignoring the error, opening it, and
pd.read_table`.
with codecs.open("file/to/path", "r", "Shift-JIS", "ignore") as file:
df = pd.read_table(file, delimiter=",")
print(df)
It seems that you can pass it as a StreamReaderWriter object as it is without doing file.read ()
.
I was addicted to it, so I took notes.
Recommended Posts