In my recent PHP project, I write programs in Japanese. Not only class names and variable names, but also file names are in Japanese. (I would like to summarize in a separate article why I decided to write in Japanese and my motivation.)
In this project, the development environment is Mac and the production environment is Linux, but problems such as PHP with Japanese file names not being autoloaded occurred. When I looked it up, it was because the Unicode standard was different between the Mac file system and the Linux file system. For more information, see "Introduction Mania Dorafuto Edition: Notes on File Names on Mac OS X (NFC, NFD, etc.) ”Article will be helpful.
To briefly explain the difference between file systems,
Mac: A standard called NFD. The voiced sound mark and the semi-voiced sound mark are separated (normalized). "Da" becomes 6 bytes of "ta" and "" Linux: A standard called NFC. Do not disperse the semi-voiced sound mark (denormalized). "Da" becomes 3 bytes
There is a difference.
If you commit the NFD file created on Mac to git, it will be staged in the normalized state as it is. It would be nice if you could convert it from NFD to NFC when you git pull it on Linux, but the file will be created as NFD. Since the PHP source code is NFC, if the file name is referenced in a fixed manner, the phenomenon that "it worked on Mac, but it stopped working on Linux" occurs.
It is unavoidable that the Japanese file has been committed, so in order to identify the problematic file for the time being, I made a script in Python to find out the NFD file.
$ find-nfd -h
usage: find-nfd [-h] [path]
Find NFD files
positional arguments:
path path to find(Default: current working directory)
optional arguments:
-h, --help show this help message and exit
find-nfd.py
#!/usr/bin/env python
import os
import argparse
from unicodedata import normalize
def fild_all_files(directory):
for root, dirs, files in os.walk(directory):
yield root
for file in files:
yield os.path.join(root, file)
def to_nfc(string):
string = string.decode("utf8")
string = normalize("NFC", string)
string = string.encode("utf8")
return string
def is_nfd(string):
if to_nfc(string) == string:
return False
else:
return True
def find_nfd_files(directory):
for file in fild_all_files(directory):
if is_nfd(file):
yield file
def main():
parser = argparse.ArgumentParser(description="Find NFD files")
parser.add_argument("path", type=str, help="path to find(Default: current working directory)", nargs='?', default=os.getcwd())
args = parser.parse_args()
count = 0
for file in find_nfd_files(args.path):
print file
count += 1
print ""
print "%u files found" % (count)
if __name__ == "__main__":
main()
It is a file made on Mac ↓
$ php -r 'var_dump(glob("/tmp/test/1/*"));'
array(7) {
[0] =>
string(13) "/tmp/test/1/a"
[1] =>
string(13) "/tmp/test/1/b"
[2] =>
string(17) "/tmp/test/1/schon"
[3] =>
string(19) "/tmp/test/1/schön"
[4] =>
string(30) "/tmp/test/1/한글"
[5] =>
string(27) "/tmp/test/1/Hahifuheho"
[6] =>
string(42) "/tmp/test/1/Papipupepo"
}
It is completely indistinguishable whether it is NFD or NFC, but you can see that the number of bytes in the string is different between "Hahifuheho" and "Papipupepo". You can see that the German umlaut and the Korean Hangul are also NFD.
Look for the NFD file in this:
$ find-nfd.py /tmp/test/1
/tmp/test/1/schön
/tmp/test/1/한글
/tmp/test/1/Papipupepo
3 files found
I found three.
If you find such a file, you will have to rename it in a Linux or Windows environment and put it back in git.
You might be asked, "Do you do this annoying thing every time?", But vagrant is changing the development mechanism itself so that a Debian environment can be completed in just 5 minutes :)
Although it is a Mac, it is not a production environment, so it is important to create the exact same development environment as the production environment in order to avoid unnecessary harmony.
Recommended Posts