How to delete BOM (UTF-8)

I implemented the process related to file operation in business, but from UTF-8 (with BOM) file Now that you've learned how to remove a BOM, I'll summarize it for the future.

What is BOM in the first place?

First of all, who is BOM in the first place?

Roughly speaking with the BOM ** A mark at the beginning of a file created with a Unicode character code **. In UTF-8, it is represented by 3 bytes of ** 0xEF 0xBB 0xBF **. The BOM cannot usually be seen with Notepad, but it is actually at the beginning of the file contents. It has a BOM, and when it is read by the computer, it is interpreted and executed in that way. And it has two main roles as a landmark.

  1. To show that it is written in Unicode character code
  2. For specifying the order of bits called endian in UTF-16 and UTF-32. Depending on the order of arrangement ・ Big endian (arranged in order from the highest byte) ・ Little endian (arranged in order from the lowest byte) There are two types

Why does UTF-8 (with BOM) exist?

When associating with characters with a character code of 2 bytes or more such as UTF-16 and UTF-32 BOM is used to specify the order of endianness. However, when associating with a 1-byte character code like UTF-8, You don't have to specify endianness. So why does UTF-8 (with BOM) exist?

After investigating, I found that the cause was the specification when Excel opened CSV. When Excel opens CSV, it tries to open with Shift-JIS, so UTF-8 without BOM When I try to read the written file, the characters are garbled. To prevent this, even when opening CSV with BOM, use Unicode character code. You need to specify to read it.

How to remove the BOM

Now, I will explain how to delete the BOM that is the main subject. Java does not assume that UTF-8 has a BOM in the first place. Therefore, when reading a file with a BOM, use the BOM as another character. Treat it as similar and do not delete the BOM. Therefore, if you want to delete the BOM, you need to implement such a process separately.

Java


    //Unicode code display of BOM
    public static final String BOM = "\uFEFF";

    /**
     *If the file contained a BOM
     *Convert without BOM.
     *
     * @param s file string
     * @File string without return BOM
     *
     */
    private static String removeUTF8BOM(String s) {
        if (s.startsWith(BOM)) {
            //Read the character string after the beginning of the file
            s = s.substring(1);
        }
        return s;
    }

Another method is to use the class library provided by apache. See below for detailed specifications.

Class for reading files with BOM

Reference article

To remove the BOM using Java Remedy for UTF-8 (with BOM)

Recommended Posts

How to delete BOM (UTF-8)
[Beginner] How to delete NO FILE
How to add the delete function
How to delete the wrong migration file
How to delete the migration file NO FILE
How to deploy
How to delete data with foreign key
How to count UTF-8 code points fast
How to delete a controller etc. using a command
[For beginners] How to implement the delete function
How to delete the database when recreating the application
Rails "How to delete NO FILE migration files"
How to develop OpenSPIFe
How to call AmazonSQSAsync
How to use Map
How to write Rails
How to use rbenv
How to use letter_opener_web
How to use with_option
How to use fields_for
How to use java.util.logging
How to use map
How to use collection_select
How to adapt Bootstrap
How to use Twitter4J
How to use active_hash! !!
How to install Docker
How to use MapStruct
How to use hidden_field_tag
How to use TreeSet
How to write dockerfile
How to uninstall Rails
How to install docker-machine
[How to use label]
How to make shaded-jar
How to write docker-compose
How to use identity
How to use hashes
How to write Mockito
How to create docker-compose
How to use JUnit 5
How to install MySQL
How to write migrationfile
How to build android-midi-lib
How to use Dozer.mapper
How to use Gradle
How to use org.immutables
How to use java.util.stream.Collector
How to use VisualVM
How to use Map
How to install ngrok
How to type backslash \
How to concatenate strings
How to delete / update the list field of OneToMany
How to delete a new_record object built with Rails
How to delete custom Adapter elements using a custom model
How to delete untagged images in bulk with Docker
How to create an application
[Java] How to use Map
How to resolve Sprockets :: DoubleLinkError
[rails] How to post images