Character code for reading and writing csv files with python ~ windows environment ver ~

Background of posting this article

As a python beginner, I sometimes get an error in encoding when reading and writing csv files, so I made a note of the summary of the contents. It is also an ** article for beginners **. The environment will be a windows environment.

error

About errors that are often caught when reading and writing csv files

Error when writing

error contents


UnicodeEncodeError: 'shift_jis' codec can't encode character '\u9ad9' in position 14: illegal multibyte sequence

It means that there are some characters that cannot be encoded with shift-jis. It occurs when the character code of the file and the character code of the written character do not match when writing the file.

By the way, the code is specified here.

Code example


with open(filepath, 'w', newline='', encoding='shift-jis') as f

Error when reading

error contents


UnicodeDecodeError: 'shift_jis' codec can't decode byte 0xee in position 0

It means that there are some characters that cannot be coded with shift-jis. It occurs when the character code of the file and the character code specified in the file reading do not match when reading the file. (Or, characters that cannot be read with the character code specified when reading the file are written in the file.)

By the way, the code is specified here.

Code example


data = pd.read_csv(filepath, encoding = 'shift-jis')

What is the correct character code specification?

If you want to perform a series of operations of file creation, writing, and reading on python, if you specify according to the horizontal axis below, no error should occur. (The meaning of the character code of the file represents the character code of csv created by the character code specified at the time of writing)

Character code at the time of writing File character code Character code at the time of reading
UTF-8 UTF-8 UTF-8
cp932 ansi cp932
shift-jis ansi shift-jis

** If both cp932 and shift-jis are files, it's ansi, but which one do you use? ** ** I think the biggest difference between cp932 and shift-jis is whether they can handle environment-dependent characters such as ** Hashigodaka ** and ** 﨑 (Tatesaki) **. What you can do is cp932. So, for example, when ansi csv files are linked from other systems, it is better to assume that they will be imported with cp392 instead of shift-jis.

Recommended Posts

Character code for reading and writing csv files with python ~ windows environment ver ~
[Introduction for beginners] Reading and writing Python CSV files
Reading and writing CSV with Python
Reading and writing JSON files with Python
Reading and writing fits files with Python (memo)
Example of reading and writing CSV with Python
Python CSV file reading and writing
Reading and writing NetCDF with Python
uproot: Python / Numpy based library for reading and writing ROOT files
Study from Python Reading and writing Hour9 files
Python (Windows 10) Virtual Environment / Package with VS Code
Image Processing with Python Environment Setup for Windows
[Python] Reading CSV files
Let's try analysis! Chapter 8: Analysis environment for Windows created with Python and Eclipse (PyDev)
Python development environment with Windows + Anaconda3 + Visual Studio Code
Python development environment with Windows + Python + PipEnv + Visual Studio Code
Python with VS Code (Windows 10)
Reading .txt files with Python
Prepare Python development environment with Mac + Windows + VisualStudio Code (Windows version)
Prepare Python development environment with Mac + Windows + VisualStudio Code (Mac version)
Python code for writing CSV data to DSX object storage
Comfortable LaTeX with Windows Subsystem for Linux and VS Code
Execute C ++ functions from Python with Pybind11 (for Windows & Visual Studio Code people) Environment construction
Check what the character code is for all files under the directory that is Python and output
Handle Excel CSV files with Python
Reading and writing text in Python
Create a Python environment for professionals in VS Code on Windows
Create a Python execution environment for Windows with VScode + Remote WSL
Build a 64-bit Python 2.7 environment with TDM-GCC and MinGW-w64 on Windows 7
Building a Windows 7 environment for getting started with machine learning with Python
How to import CSV and TSV files into SQLite with Python
Create a simple Python development environment with VS Code and Docker
How to build Python and Jupyter execution environment with VS Code
LaTeX and R (a little Python) environment construction with SublimeText3 (Windows)
Prepare a Python virtual environment for your project with venv with VS Code
Python project environment construction procedure (for windows)
Building a Python 3.6 environment with Windows + PowerShell
Steps to create a Python virtual environment with VS Code on Windows
Read and write csv files with numpy
[Easy Python] Reading Excel files with openpyxl
Python CSV file Character code conversion, file name extraction, reading, output, merging operation
Put MeCab binding for Python with pip on Windows, mac and Linux
Build an environment to execute C ++ functions from Python with Pybind11 (for Windows & Visual Studio Code people)
Quickly build a python environment for deep learning and data science (Windows)
[Easy Python] Reading Excel files with pandas
Clean python environment with pythonz and virtualenv
[Python] Master the reading of csv files. List of main options for pandas.read_csv.
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]
Easy Python data analysis environment construction with Windows10 Pro x VS Code x Docker
Build a local development environment with WSL + Docker Desktop for Windows + docker-lambda + Python
Code reading for m3u8, a library for manipulating HLS video format m3u8 files in Python
Recommended environment and usage when developing with Python
Causal reasoning and causal search with Python (for beginners)
Building a python environment with virtualenv and direnv
(Windows) Causes and workarounds for UnicodeEncodeError on Python 3
Build Python development environment with Visual Studio Code
Building an Anaconda environment for Python with pyenv
Install OpenCV 4.0 and Python 3.7 on Windows 10 with Anaconda
Drag and drop local files with Selenium (Python)
Character encoding when dealing with files in Python 3
Links and memos of Python character code strings