The other day, when I was thinking about character codes, I suddenly had the following question.
"The byte string should be different depending on the character code, how does the computer recognize it?"
I had this question because of my lack of ability, but this time I summarized what I learned.
The conclusion is that the difference in the character code of the program we input is converted into a unique byte string by the application such as the compiler, so ** the character code of the program does not affect the hardware such as the CPU. Can be said. ** **
But if you think about it **, if your application behaves as "my original assumption", then you're talking about what the compiler is doing. ** It made me realize my lack of study again.
(1) Save a Java file in different locations twice in total. When saving, each is saved with a different character code. (The same source can be saved with a different character code) (2) Comparing the differences in the files created in (1) with binary data, ** the differences should be confirmed because they are saved in different character codes ** ③ Convert each to a class file ④ Comparing the differences of each class file, ** the difference should not be confirmed **
Prepare the following sources by referring to HelloWorld.java.
HelloWorld.java
public class HelloWorld {
public static void main(String[] args) {
// Prints "Hello, World" to the terminal window.
System.out.println("Hello, World");
}
}
After saving, check the character code of each file. You can see that the source is the same, but the character code is different.
UTF-Compile 16
javac -encoding UTF-16 HelloWorld.java
UTF-Compile 8
javac HelloWorld.java
It turns out that the difference in the character code of the source is absorbed by the compiler and converted into a unique byte string.
The above experiment was done in Java, but after this it seems that the class file is converted to machine language by the JVM and the CPU executes it as a program. So, in this experiment, ** we found that the difference in the character code of the input we input is summarized when it is compiled into a class file. ** **
reference: Now ... I didn't know Java was "compiled" twice! (> <)
Originally, I was wondering, "The byte string should be different depending on the character code, but how does the hardware recognize it?", So I examined what I wrote above. However, this question may also have arisen because I didn't understand the role of the compiler properly. ** I realized the importance of studying the basics of the application again.
Recommended Posts