Open UTF-8 with BOM in Python

First from the conclusion

Encode when reading with UTF-8 BOM in Python

Specify **'utf_8_sig' **.

Example of reading a file io.opne(filename, "r", encoding="utf_8_sig")

Convert from str type (UTF-8) to unicode type uni_string = unicode(str_string, 'utf_8_sig')

Introduction to scratching

I was a little addicted to reading UTF-8 in Python, so I'll write it down to prevent forgetting.

What is BOM

UTF-8 may have a BOM (Byte order mark). This is an identifier that the encoding is UTF-8. The first 3 bytes of the file are'EF BB BF'.

The trouble is that there are UTF-8 with BOM and UTF-8 without BOM.

BOM is added to UTF-8 in Windows'Notepad' and Excel. Linux and Mac basically seem to handle UTF-8 without BOM.

Handle files with UTF-8 BOM in Python

This time I wanted to load the csv edited in Excel, so I had to consider the BOM.

I thought, I wrote it in the document.

Official document UTF-8 with BOM mark

If you set the encoding codec to'utf_8_sig', If there is a BOM, it will be skipped and read. If there is no BOM, it can be read as UTF-8 as it is.

Sample program

ImportCSV.py



import io

with io.open('sample.csv', 'rt', encoding='utf_8_sig') as f:
    print(f.readlines())

Finally

Character codes tend to be addictive in Python, but if you can handle character codes properly when converting to unicode type, you will not have to worry about character codes.

Recommended Posts

Open UTF-8 with BOM in Python
UTF8 text processing in python
Scraping with selenium in Python
Working with LibreOffice in Python
Debugging with pdb in Python
Working with sounds in Python
Scraping with Selenium in Python
Scraping with Tor in Python
Tweet with image in Python
Combined with permutations in Python
Number recognition in images with Python
Testing with random numbers in Python
GOTO in Python with Sublime Text 3
Working with LibreOffice in Python: import
Scraping with Selenium in Python (Basic)
CSS parsing with cssutils in Python
Numer0n with items made in Python
Use rospy with virtualenv in Python3
Use Python in pyenv with NeoVim
Heatmap with Dendrogram in Python + matplotlib
Hash method (open address method) in Python
Read files in parallel with Python
Password generation in texto with python
Use OpenCV with Python 3 in Window
Until dealing with python in Atom
Get started with Python in Blender
Working with DICOM images in Python
Write documentation in Sphinx with Python Livereload
Get additional data in LDAP with python
Spiral book in Python! Python with a spiral book! (Chapter 14 ~)
Try logging in to qiita with Python
Stress Test with Locust written in Python
Python3> in keyword> True with partial match?
Get exchange rates from open exchange rates in Python
Exclusive control with lock file in Python
Device monitoring with On-box Python in IOS-XE
Try working with binary data in Python
Draw Nozomi Sasaki in Excel with python
Tips for dealing with binaries in Python
Display Python 3 in the browser with MAMP
Page cache in Python + Flask with Flask-Caching
Post Test 3 (Working with PosgreSQL in Python)
How to work with BigQuery in Python
Playing card class in Python (with comparison)
Dealing with "years and months" in Python
Process multiple lists with for in Python
Replace non-ASCII with regular expressions in Python
Connect with mysql.connector with ssh tunnel in Python 3.7
One liner webServer (with CGI) in python
Get Started with TopCoder in Python (2020 Edition)
Easy image processing in Python with Pillow
To work with timestamp stations in Python
Call APIGateWay with APIKey in python requests
Read text in images with python OCR
Introduced sip-4.14 in python3.2.2 environment with MacOS 10.7.4
Python in optimization
CURL in python
FizzBuzz with Python3
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python