The application absorbs the difference in character code

Overview

The other day, when I was thinking about character codes, I suddenly had the following question.

"The byte string should be different depending on the character code, how does the computer recognize it?"

I had this question because of my lack of ability, but this time I summarized what I learned.

Conclusion

The conclusion is that the difference in the character code of the program we input is converted into a unique byte string by the application such as the compiler, so ** the character code of the program does not affect the hardware such as the CPU. Can be said. ** **

My original assumption

図.png

Actual movement

図2.png But if you think about it **, if your application behaves as "my original assumption", then you're talking about what the compiler is doing. ** It made me realize my lack of study again.

I tried experimenting

⓪ Hypothesis

(1) Save a Java file in different locations twice in total. When saving, each is saved with a different character code. (The same source can be saved with a different character code) (2) Comparing the differences in the files created in (1) with binary data, ** the differences should be confirmed because they are saved in different character codes ** ③ Convert each to a class file ④ Comparing the differences of each class file, ** the difference should not be confirmed **

① Prepare the same source saved with different character codes

Prepare the following sources by referring to HelloWorld.java.

`HelloWorld.java`


public class HelloWorld {

    public static void main(String[] args) {
        // Prints "Hello, World" to the terminal window.
        System.out.println("Hello, World");
    }

}

After saving, check the character code of each file. You can see that the source is the same, but the character code is different.

② Take the difference of ①

You can see that there are differences in the binary data. ### ③ Compile each into a class file

`UTF-Compile 16`


javac -encoding UTF-16 HelloWorld.java

`UTF-Compile 8`


javac HelloWorld.java

④ Take the difference of the class file

There is no difference!

result

It turns out that the difference in the character code of the source is absorbed by the compiler and converted into a unique byte string.

The above experiment was done in Java, but after this it seems that the class file is converted to machine language by the JVM and the CPU executes it as a program. So, in this experiment, ** we found that the difference in the character code of the input we input is summarized when it is compiled into a class file. ** **

reference: Now ... I didn't know Java was "compiled" twice! (> <)

Summary

Originally, I was wondering, "The byte string should be different depending on the character code, but how does the hardware recognize it?", So I examined what I wrote above. However, this question may also have arisen because I didn't understand the role of the compiler properly. ** I realized the importance of studying the basics of the application again.