Character encoding when using csv module of python 2.7.3

I loaded a csv file containing Japanese values using Python's csv module Problems under the following conditions and their solutions

First read normally The format of csv is Keep it in the form integer, string

python


class data:
    
  def __init__(self, id, name):
    self.id = id
    self.name = name

import csv

csvfile = open(filename)
reader = csv.reader(csvfile)
rows = [data(row[0], row[1]) for row in reader]

You have now read the data for all rows Next, ** extract data that contains any Japanese string in the name **

python


text = raw_input()
result = [row for row in rows if text in row.name]

The problem arises here. The string obtained by raw_input () is ** unicode type ** However, what you get with data.name is the ** utf-8 str type ** read by csv.reader. Of course, comparison is not possible, so an error occurs at ʻif text in row.name`.

Then what to do

The solution is

I think there are two possibilities, but since it is unicode that is easy to handle, I will use the former this time. In this case, what needs to be fixed is

python


rows = [data(row[0], row[1]) for row in reader]

is. Do this here

python


rows = [data(row[0], row[1].decode('utf-8')) for row in reader]

str # decode is an arbitrary character encoding that converts str type to unicode type. This time the original is utf-8 str, so I decoded it with utf-8 and changed it to unicode. Decoding needs to be changed depending on the character encoding of the original csv file (of course)

Now you can compare unicodes. Congratulations When you play with python on Windows, you often run into this kind of character encoding problem.

Recommended Posts

Character encoding when using csv module of python 2.7.3
Solve the Japanese problem when using the CSV module in Python.
Character encoding when dealing with files in Python 3
Try using the collections module (ChainMap) of python3
Automatic update of Python module
python: Basics of using scikit-learn ①
2.x, 3.x character code of python
View using the python module of Nifty Cloud mobile backend
[Python: UnicodeDecodeError] One of the error solutions when reading CSV
List of libraries to install when installing Python using Pyenv
[Python] Loading csv files using pandas
Precautions when using pit in Python
[Python] Be careful when using print
Implementation of desktop notifications using Python
Try using the Python Cmd module
Precautions when using phantomjs from python
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
[VS Code] ~ Tips when using python ~
When using regular expressions in Python
Python: Basics of image recognition using CNN
[Python] Extension using inheritance of matplotlib (NavigationToolbar2TK)
Automatic collection of stock prices using python
About building GUI using TKinter of Python
Change the length of Python csv strings
Periodic execution processing when using tkinter [Python3]
(Bad) practice of using this in Python
Pass the path of the imported python module
Comparison of Japanese conversion module in Python3
Make a relation diagram of Python module
Python: Application of image recognition using CNN
Favicon placement (when using Python, Flask, Heroku)
[python, multiprocessing] Behavior for exceptions when using multiprocessing
[Python] Accelerates loading of time series CSV
[Python] How to read a csv file (read_csv method of pandas module)
Check the path of the Python imported module
Basic grammar of Python3 system (character string)
Python encoding
Study on Tokyo Rent Using Python (3-1 of 3)
Character code conversion of CSV file using Loop (Shift JIS to UTF8)
[Python] Get the character code of the file
When Python "Syntax Error: Non-ASCII character ~" appears
Meaning of using DI framework in Python
I stumbled on the character code when converting CSV to JSON in Python
Dynamically add CSV header as attribute name of Model class using setattr of python
When using PyQtGraph with Python Pyside, pay attention to the order of import
Error when installing a module with Python pip
Time variation analysis of black holes using python
[python] Create a list of various character types
[Python] Let's execute the module regularly using schedule
Beware of disable_existing_loggers when setting up Python logging
[Python] [chardet] Automatic detection of character code of file
Introduction of Python Imaging Library (PIL) using HomeBrew
[Python of Hikari-] Chapter 08-04 Module (Installation of external library)
Pass values between pages using Python 3.5 cgi module
I tried reading a CSV file using Python
Calculation of match rate of character string breaks [python]
I tried using the Datetime module by Python
Links and memos of Python character code strings
[Python] A rough understanding of the logging module
Anonymous upload of images using Imgur API (using Python)