I have summarized the reasons why the following error appears when reading a csv file with python.
import pandas as pd
pd.read_csv("file/to/path")
I hope it will be helpful for those who have the following code when reading_csv with pandas.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte
To conclude first, please use the code below.
pd.read_csv("file/to/path", encoding="shift-jis")
encoding="shift-jis"
If you just put it on, it should be OK! If you still get the error, read 2 and later and consider the reason.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte
In the first place, what makes this error angry is rough. It says that data cannot be read if the character code "utf-8" is used.
For example, I can't read English, so change it to Japanese and then read the data! It's like that.
So let's convert it to another character code. Converting that character code is called encoding.
In order to explain it, let me briefly explain the typical character code.
UTF-8 It is the most popular character code in the world and is one of the encoding methods for Unicode. It is established for the purpose of using the characters defined in ASCII as they are in Unicode.
Aside from the difficult things here, it's okay to recognize that it is the most used.
It is a standard character code on the Internet, especially for e-mail.
EUC Abbreviation for Extended Unix Code, used by Japanese UNIX.
Shift_JIS A code developed by Microsoft, which is ASCII code characters with Japanese characters added. It is also used on Windows and Mac, and is widely used for files on PCs.
** In other words, the csv file containing Japanese cannot be read in UTF-8, so change it to Shift_JIS **
For those who could not read the data encoded in Shift_JIS Think about which of the other character codes the file fits into and try it.
By the time you write a memo
Recommended Posts