Introduction

There are quite a few situations in which ordinary people receive files created with Excel or Word on a Windows machine and process information on linux. I have summarized the basics of the data conversion procedure that will be required at that time.

Bring Japanese name files from windows to linux

Here, the Japanese name file is a file in which so-called double-byte characters are used in the file name.

Extract the zip file created under windows environment under Linux environment

unzip -O cp932 Archive containing Japanese name files.zip

cp932 is a character code standard extended by microsoft to Shift JIS.

reference: Actually not scary CP932

Convert file names written in Shift JIS to UTF-8

convmv -f cp932 -t utf-8 * --notest

For ubuntu, the convmv command is not installed by default, so you need to do ʻapt install convmv` in advance.

If a file with a Japanese name is unzipped from zip on linux using the "standard" procedure, it seems that it will be converted to UTF-8 with inappropriate garbled characters, and even if you try to convmv after that, " "It has been processed" and it is refused. There is no choice but to give up and re-extract from the original zip file according to the above procedure.

Convert character code and line feed (Shift jis → UTF-8 and CRLF → LF)

Convert the text of Shift jis to UTF-8, convert the line feed code from CR / LF which is the standard of windows to LF, and write the result to a new file.

iconv -f cp932 -t utf-8 Target file name| sed 's/\r//g' >Output destination file name

I also considered a method to process all the files in the current directory at once. Let's write out the converted contents by creating a file with the same name in the subdirectory ʻutf8`. Use bash loop processing.

#!/bin/bash
[ -d utf8 ] || mkdir utf8
for a in *
  do iconv -f cp932 -t utf-8 $a | sed 's/\r//g' > utf8/$a
done

Windows → linux Tips for bringing in data

Introduction

Bring Japanese name files from windows to linux

Extract the zip file created under windows environment under Linux environment

Convert file names written in Shift JIS to UTF-8

Convert character code and line feed (Shift jis → UTF-8 and CRLF → LF)