Handle zip files with Japanese filenames in Python 3

Although zip files can store filenames in UTF-8 in recent specifications, they often use a legacy environment-dependent character code format that stores filenames. In the case of Japanese, Shift-JIS (cp932) is often used according to Windows.

In Python 2, the file name returned by the zipfile module was a byte string, so the file name of cp932 was returned as it was, but in Python 3, the character string was unified to Unicode, so when the zip file is read, the file name is decoded. It will be returned as a character string. However, of course, Japanese customs are not the default behavior, so the characters will be garbled as they are.

When I read the zipfile module in Python 3.4, it looked like this:

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                filename = filename.decode('cp437')

Wouldn't it be possible to get a UnicodeDecodeError by decoding a cp932 encoded string?

>>> len(bytes(range(256)).decode('cp437'))
256

cp437 seems to decode all bytes one-to-one per character. So, it seems good to re-encode with cp437 and then decode with cp932 again.

import zipfile
zf = zipfile.ZipFile('foo.zip')
for name in zf.namelist():
    print(name.encode('cp437').decode('cp932')

Recommended Posts

Handle zip files with Japanese filenames in Python 3
Recursively unzip zip files with python
Handle Excel CSV files with Python
Read files in parallel with Python
How to handle Japanese in Python
[Python] Get the files in a folder with Python
Resolve Japanese write error UnicodeEncodeError in Python files
Handle Excel with python
Handle rabbimq with python
Create an image with characters in python (Japanese)
Character encoding when dealing with files in Python 3
Handle markdown in python
Handle Parquet in Python
Japanese output in Python
Extract zip with Python (Japanese file name support)
[Python] How to handle Japanese characters with openCV
Convert the image in .zip to PDF with Python
Japanese output when dealing with python in visual studio
Handle Ambient data in Python
Send Japanese email with Python3
I wrote python in Japanese
Scraping with selenium in Python
Working with LibreOffice in Python
Scraping with chromedriver in python
Debugging with pdb in Python
Handle environment variables in Python
Working with sounds in Python
Sorting image files with Python (2)
Scraping with Selenium in Python
Sort huge files with python
Sorting image files with Python (3)
Scraping with Tor in Python
[Tips] Handle Athena with Python
Tweet with image in Python
Sorting image files with Python
Combined with permutations in Python
Japanese morphological analysis with Python
Integrate PDF files with Python
Reading .txt files with Python
I understand Python in Japanese!
Handle JSON files with Matlab
Handle complex numbers in Python
Get Japanese synonyms in Python
[R] [Python] Memo to read multiple csv files in multiple zip files
Issue reverse geocoding in Japanese with Python Google Maps API
Number recognition in images with Python
Transpose CSV files in Python Part 1
Handle Base91 keys with python + redis.
Testing with random numbers in Python
GOTO in Python with Sublime Text 3
Working with LibreOffice in Python: import
Manipulating EAGLE .brd files with Python
CSS parsing with cssutils in Python
Easily handle lists with python + sqlite3
Manipulate files and folders in Python
[Python] POST wav files with requests [POST]
Handle posix message queues in python
Handle NetCDF format data in Python
Numer0n with items made in Python
Handle GDS II format in Python
Handling of JSON files in Python