Character encoding when using csv module of python 2.7.3

I loaded a csv file containing Japanese values using Python's csv module Problems under the following conditions and their solutions

Character encoding of csv file to read: UFT-8
Character encoding obtained with sys.getdefaultencoding (): ascii
Script character encoding: UTF-8

First read normally The format of csv is Keep it in the form integer, string

`python`


class data:
    
  def __init__(self, id, name):
    self.id = id
    self.name = name

import csv

csvfile = open(filename)
reader = csv.reader(csvfile)
rows = [data(row[0], row[1]) for row in reader]

You have now read the data for all rows Next, ** extract data that contains any Japanese string in the name **

`python`


text = raw_input()
result = [row for row in rows if text in row.name]

The problem arises here. The string obtained by raw_input () is ** unicode type ** However, what you get with data.name is the ** utf-8 str type ** read by csv.reader. Of course, comparison is not possible, so an error occurs at ʻif text in row.name`.

Then what to do

The solution is

Make everything unicode
All UTF-8 str

I think there are two possibilities, but since it is unicode that is easy to handle, I will use the former this time. In this case, what needs to be fixed is

`python`


rows = [data(row[0], row[1]) for row in reader]

is. Do this here

`python`


rows = [data(row[0], row[1].decode('utf-8')) for row in reader]

str # decode is an arbitrary character encoding that converts str type to unicode type. This time the original is utf-8 str, so I decoded it with utf-8 and changed it to unicode. Decoding needs to be changed depending on the character encoding of the original csv file (of course)

Now you can compare unicodes. Congratulations When you play with python on Windows, you often run into this kind of character encoding problem.

Recommended Posts

Character encoding when using csv module of python 2.7.3

Solve the Japanese problem when using the CSV module in Python.

Character encoding when dealing with files in Python 3

Try using the collections module (ChainMap) of python3

Automatic update of Python module

python: Basics of using scikit-learn ①

2.x, 3.x character code of python

View using the python module of Nifty Cloud mobile backend

[Python: UnicodeDecodeError] One of the error solutions when reading CSV

List of libraries to install when installing Python using Pyenv