[LINUX] Japanese file name is garbled when setting LANG environment variable when running Java program on Docker container

Overview

When I was running a Java program on a CentOS Docker container, a mysterious garbled character occurred when I got a list of files containing Japanese filenames.

Sample.java


import java.io.*;

public class Sample {
   public static void main(String[] args) {
      //File name is Japanese file "/sample/AIUEO.Place "csv"
      new File("/sample").listFiles(new FilenameFilter() {
         public boolean accept(File dir, String name) {
            System.out.println(name);   // =>Japanese file name is garbled when getting file list
            return false;
         }
      });
   }
}

By the way, it has been confirmed that garbled characters do not occur when the LANG environment variable is set to ʻen_US.UTF-8, and garbled characters occur when the LANG environment variable is set to ja_JP.UTF-8`.

This article describes the causes of garbled Japanese file names and how to deal with them.

Cause and remedy

First, setting ja_JP.UTF-8 in the LANG environment variable causes garbled characters, which is because the Japanese locale is not registered in the ** CentOS image of Docker **.

You can check the locales that can be specified in the LANG environment variable with the locale -a command. Try running the command inside the container of the CentOS image to check.

# locale -a
C
POSIX
en_US.utf8

As mentioned above, the Japanese locale is not included in the Docker CentOS image container. If you try to get the file list from a Java program by specifying the LANG environment variable as shown below in this container, the Japanese file name will be garbled.

LANG=ja_JP.UTF-8
export LANG

java Sample
=>Garbled Japanese file name.csv

As a workaround, use the localedef command to ** add a Japanese locale ** to eliminate the garbled characters. Add the following command as a RUN instruction in the Dockerfile or run it inside the container.

# localedef -f UTF-8 -i ja_JP ja_JP.UTF-8

Check the locales that can be specified with the locale -a command again.

# locale -a
C
POSIX
en_US.utf8
ja_JP.utf8

The localedef command added ja_JP.utf8. Now, even if you set the LANG environment variable, you can handle Japanese file names without garbled characters.

Conclusion

Recommended Posts

Japanese file name is garbled when setting LANG environment variable when running Java program on Docker container
Start Docker container when running Pytest