Since the processing of multi-byte characters is still delicate in Python 2 system on Windows, if you write the file enumeration process normally, the enumeration process works well when a specific character ("table", "so", etc.) appears in the file path. It may not work. So-called 5C problem.
C:/test
filelist.py
Tesuto/
a1.txt
a2.txt
table/
hyo1.txt
hyo2.txt
Table inside/
hyo10.txt
hyo11.txt
filelist.py
# -*- coding: utf-8 -*-
import os
SEP = os.sep
def filelist(dir_path):
for item in os.listdir(dir_path):
file_path = dir_path + SEP + item
print(file_path)
if os.path.isdir(file_path):
filelist(file_path)
# test
script_dir = os.path.dirname(os.path.abspath(__file__))
filelist(script_dir)
C:\test\filelist.py
C:\test\Tesuto
C:\test\Tesuto\a1.txt
C:\test\Tesuto\a2.txt
C:\test\table
C:\test\table\table
The enumeration in the path that contains the "table" is not working.
@wonderful_panda taught me. If you do .decode ('cp932') after getting the file path, you can enumerate the files without any problem.
# -*- coding: utf-8 -*-
import os
SEP = os.sep
def filelist(dir_path):
for item in os.listdir(dir_path):
file_path = dir_path + SEP + item
print(file_path)
if os.path.isdir(file_path):
filelist(file_path)
# test
script_dir = os.path.dirname(os.path.abspath(__file__.decode('cp932')))
filelist(script_dir)
C:\test\filelist.py
C:\test\Tesuto
C:\test\Tesuto\a1.txt
C:\test\Tesuto\a2.txt
C:\test\table
C:\test\table\hyo1.txt
C:\test\table\hyo2.txt
C:\test\table\中のtable
C:\test\table\中のtable\hyo10.txt
C:\test\table\中のtable\hyo11.txt
The enumeration in the path containing the "table" was also successful.
The countermeasure by decode is overwhelmingly smarter, but I will leave the countermeasure that I wrote before telling you about it.
filelist.py
# -*- coding: utf-8 -*-
import os
SEP = os.sep
def filelist2(dir_path):
old_dir = os.getcwd()
os.chdir(dir_path) #Change current directory.
for item in os.listdir("."):
file_path = dir_path + SEP + item
print(file_path)
if os.path.isdir(item):
filelist2(file_path)
os.chdir(old_dir) #Restore the current directory.
# test
script_dir = os.path.dirname(os.path.abspath(__file__))
filelist(script_dir)
If you pass a path containing 5C characters to os.listdir, a problem will occur, so instead of passing the path directly, set the current directory to the target path in advance, and set os.listdir to the current path. Pass `".
`` `Indicating the directory. This allows the enumeration to be performed normally.
C:\test\filelist.py
C:\test\Tesuto
C:\test\Tesuto\a1.txt
C:\test\Tesuto\a2.txt
C:\test\table
C:\test\table\hyo1.txt
C:\test\table\hyo2.txt
C:\test\table\中のtable
C:\test\table\中のtable\hyo10.txt
C:\test\table\中のtable\hyo11.txt
The enumeration in the path containing the "table" was also successful.
Even on Windows, with Python 3.5 etc., file enumeration could be performed normally without taking measures like this. If it is a new program that is known to perform multi-byte character string processing, it is safe to start with Python 3 series instead of Python 2 series.
I tried to enumerate files by the same method (change the current directory) in PHP on Windows, but it didn't work. I tried to catch a lot of information, but in conclusion, there seems to be no solution in PHP?
This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.
If you set a breakpoint at readdir_r() in win32\readdir.c, you'll see that FindNextFile already returns a filename with question marks in place of the characters you want, so there's nothing you can do about it, apart from patching PHP itself.
Recommended Posts