CSV import with BOM

Hello, this is Misuda of engineers. This time, I will describe that I tried to write and read UTF-8 with BOM by csv import!

background

When developing a business system, we receive a request "I want to create data!" In a batch. .. .. At that time, I think about importing with csv first. Creating an API also costs the other party to develop.

** The problem here is how to edit csv. ** ** Are you using * Microsoft Excel *? After all, it's easy to edit!

problem

When considering the use of * Microsoft Excel *, you can edit it by creating a CSV with Shift-JIS. If the DB is UTF-8, it is necessary to convert the character code on the server side. When this happens, it is a battle with the character code. To be honest, I don't feel like winning.

In such a case, UTF-8 with BOM (byte order mark) seems to open with * Microsoft Excel * without garbled characters!

Generate UTF-8 file with BOM

This time, JAVA will generate the file. In the case of UTF-8, the beginning of the file will be [0xEF 0xBB 0xBF].

import java.io.*;
import java.util.Arrays;
import java.util.List;

public class Main {
    
    /**
     *Create a CSV file with BOM (character code is UTF)-8)
     *
     * @param 
     * @return 
     */
    public static void main(String[] args) {
        File file = new File("File path");
        List header = Arrays.asList("Apple","Mandarin orange","banana","Strawberry","melon","Grape");
        try(FileOutputStream fos = new FileOutputStream(file);
            OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
            PrintWriter writer = new PrintWriter(osw)){
            //BOM grant
            fos.write(0xef);
            fos.write(0xbb);
            fos.write(0xbf);

            header.forEach(c -> {
                writer.print(c);
                writer.print(",");
            });
        } catch (IOException e) {
            System.out.println("Failed to generate the file.");
        }
    }

}

File import

It's okay if the generated file is definitely UTF-8 with BOM, but sometimes it isn't. Enter the judgment and read.

import java.io.*;
import java.nio.charset.StandardCharsets;
import org.apache.commons.codec.binary.Hex;


public class Main {

    /**
     *Read CSV file with BOM (character code is UTF)-8)
     *
     * @param
     * @return
     */
    public static void main(String[] args) {
        File file = new File("File path");

        try (FileInputStream fs = new FileInputStream(file);
             InputStreamReader isr = new InputStreamReader(fs, StandardCharsets.UTF_8);
             LineNumberReader lnr = new LineNumberReader(isr)) {
            //The first line
            String row = lnr.readLine();
            if (row != null && !row.isEmpty()) {
                //Get the first character
                String bom = row.substring(0, 1);
                //Convert first character to byte to character(Use Apache Commons Codec Hex class)
                String bomByte = new String(Hex.encodeHex(bom.getBytes()));
                if ("efbbbf".equals(bomByte)) {
                    //Eliminate BOM
                    row = row.substring(1);
                }
                System.out.println(row);
            }
            //Split information from the second line
        } catch (Exception e) {
            System.out.println("Failed to read the file.");
        }
    }
}

Summary

Both MacOS and WindowsOS were opened in * Microsoft Excel * and were not garbled and could be edited! After that, I think that he is editing using a text file. I wonder if there is no choice but to support it.

Recommended Posts

CSV import with BOM
Import JSON with SolrJ
Import documents with SolrJ
Converting TSV files to CSV files (with BOM) in Ruby
csv file output with opencsv
Check CSV value with RSpec
CSV output with Apache Commons CSV
[Rails] Implementation of CSV import function
CSV parsing with newline characters in fields