[PYTHON] "Stop committing Japanese files to git on Mac> <" For the time being, I wrote a script to search for incompatible Japanese files on Mac and Linux.

In my recent PHP project, I write programs in Japanese. Not only class names and variable names, but also file names are in Japanese. (I would like to summarize in a separate article why I decided to write in Japanese and my motivation.)

In this project, the development environment is Mac and the production environment is Linux, but problems such as PHP with Japanese file names not being autoloaded occurred. When I looked it up, it was because the Unicode standard was different between the Mac file system and the Linux file system. For more information, see "Introduction Mania Dorafuto Edition: Notes on File Names on Mac OS X (NFC, NFD, etc.) ”Article will be helpful.

To briefly explain the difference between file systems,

Mac: A standard called NFD. The voiced sound mark and the semi-voiced sound mark are separated (normalized). "Da" becomes 6 bytes of "ta" and "" Linux: A standard called NFC. Do not disperse the semi-voiced sound mark (denormalized). "Da" becomes 3 bytes

There is a difference.

If you commit the NFD file created on Mac to git, it will be staged in the normalized state as it is. It would be nice if you could convert it from NFD to NFC when you git pull it on Linux, but the file will be created as NFD. Since the PHP source code is NFC, if the file name is referenced in a fixed manner, the phenomenon that "it worked on Mac, but it stopped working on Linux" occurs.

It is unavoidable that the Japanese file has been committed, so in order to identify the problematic file for the time being, I made a script in Python to find out the NFD file.

How to use

$ find-nfd -h
usage: find-nfd [-h] [path]

Find NFD files

positional arguments:
  path        path to find(Default: current working directory)

optional arguments:
  -h, --help  show this help message and exit

Source code

find-nfd.py


#!/usr/bin/env python
import os
import argparse
from unicodedata import normalize

def fild_all_files(directory):
    for root, dirs, files in os.walk(directory):
        yield root
        for file in files:
            yield os.path.join(root, file)

def to_nfc(string):
    string = string.decode("utf8")
    string = normalize("NFC", string)
    string = string.encode("utf8")
    return string

def is_nfd(string):
    if to_nfc(string) == string:
        return False
    else:
        return True

def find_nfd_files(directory):
    for file in fild_all_files(directory):
        if is_nfd(file):
            yield file

def main():
    parser = argparse.ArgumentParser(description="Find NFD files")
    parser.add_argument("path", type=str, help="path to find(Default: current working directory)", nargs='?', default=os.getcwd())
    args = parser.parse_args()

    count = 0

    for file in find_nfd_files(args.path):
        print file
        count += 1

    print ""
    print "%u files found" % (count)

if __name__ == "__main__":
    main()

Try using

It is a file made on Mac ↓

$ php -r 'var_dump(glob("/tmp/test/1/*"));'
array(7) {
  [0] =>
  string(13) "/tmp/test/1/a"
  [1] =>
  string(13) "/tmp/test/1/b"
  [2] =>
  string(17) "/tmp/test/1/schon"
  [3] =>
  string(19) "/tmp/test/1/schön"
  [4] =>
  string(30) "/tmp/test/1/한글"
  [5] =>
  string(27) "/tmp/test/1/Hahifuheho"
  [6] =>
  string(42) "/tmp/test/1/Papipupepo"
}

It is completely indistinguishable whether it is NFD or NFC, but you can see that the number of bytes in the string is different between "Hahifuheho" and "Papipupepo". You can see that the German umlaut and the Korean Hangul are also NFD.

Look for the NFD file in this:

$ find-nfd.py /tmp/test/1
/tmp/test/1/schön
/tmp/test/1/한글
/tmp/test/1/Papipupepo

3 files found

I found three.

If you find such a file, you will have to rename it in a Linux or Windows environment and put it back in git.

You might be asked, "Do you do this annoying thing every time?", But vagrant is changing the development mechanism itself so that a Debian environment can be completed in just 5 minutes :)

Although it is a Mac, it is not a production environment, so it is important to create the exact same development environment as the production environment in order to avoid unnecessary harmony.

Recommended Posts

"Stop committing Japanese files to git on Mac> <" For the time being, I wrote a script to search for incompatible Japanese files on Mac and Linux.
I want to move selenium for the time being [for mac]
I want to create a Dockerfile for the time being.
AtCoder writer I wrote a script to aggregate the contests for each writer
I tried Python on Mac for the first time.
I will install Arch Linux for the time being.
Search for large files on Linux from the command line
I tried running PIFuHD on Windows for the time being
I want to use the Ubuntu desktop environment on Android for the time being (Termux version)
I want to use Ubuntu's desktop environment on Android for the time being (UserLAnd version)
I will publish a shell script created to reduce the trouble of creating LiveUSB on Linux
I made a function to check if the webhook is received in Lambda for the time being
I want to record the execution time and keep a log.
I just wrote a script to build Android on another machine
I wrote a function to load a Git extension script in Python
A python script that deletes ._DS_Store and ._ * files created on Mac
I wrote a script to help goodnotes5 and Anki work together
[Hi Py (Part 1)] I want to make something for the time being, so first set a goal.
I want to use Ubuntu's desktop environment on Android for the time being (Termux version-Japanese input in desktop environment)
For the time being using FastAPI, I want to display how to use API like that on swagger
I measured the run queue wait time of a process on Linux
Technique to stop drawing the screen and reduce the waiting time for baking
I want to use Linux on mac
Create a command to search for similar compounds from the target database with RDKit and check the processing time
I thought I could make a nice gitignore editor, so I tried to make something like MVP for the time being
[Python] I made a script that automatically cuts and pastes files on a local PC to an external SSD.
How to access the contents of a Linux disk on a Mac (but read-only)
I want to make a music player and file music at the same time
I wrote a script to revive the gulp watch that will die soon
On Linux (Ubuntu), tune the Trackpad and set the function to a three-finger swipe
Make a histogram for the time being (matplotlib)
Run yolov4 "for the time being" on windows
I wrote a script to upload a WordPress plugin
I will try to summarize the links that seem to be useful for the time being
Until you run a Flask application on Google App Engine for the time being
I want to create a lunch database [EP1] Django study for the first time
I want to create a lunch database [EP1-4] Django study for the first time
[Linux] When you want to search for a specific character string from multiple files
The story when I was using IntelliJ on Linux and could not input Japanese
A python script that wants to use Mac startup / end time for attendance management
Specify the volume on linux and make a sound
Flow memo to move LOCUST for the time being
Create a QR code for the URL on Linux
Molecular dynamics simulation to try for the time being
I tried python on heroku for the first time
I wrote an automatic installation script for Arch Linux
Next to Excel, for the time being, jupyter notebook
I installed the retro game engine pyxel for Python on Mac and started the sample code
How to set up WSL2 on Windows 10 and create a study environment for Linux commands
I want to absorb the difference between the for statement on the Python + numpy matrix and the Julia for statement
Give the history command a date and time and collect the history files of all users with a script
[Shell startup] I tried to display the shell on the TV with a cheap Linux board G-cluster
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once