I thought I'd write something like a memorandum of information, so I posted it.
… But I started doing other work in the middle of writing, so I posted an unfinished product for the time being. I wonder if I should add code or something when I can afford it ...
This time, I will summarize the character code of Python3.
Recently, I brought in data from other sites and tried to format it, Error or garbled characters when bringing in a request or writing a file.
I didn't understand even if I caught various sites, so I asked a detailed person. I think it's refreshing, so I'll write it here.
First of all, I think that some people wonder what the character code is, so I will write it briefly.
To put it simply
I think….
One serial number is assigned to a certain character, The serial number determines which character to convert.
So what is the character code of Python3 used?
It looks like this when the whole is illustrated.
I will explain this one by one.
The characters stored in variables in Python3 are ** Unicode **.
PythonConsole
>>>'Ah' == u'\u3042'
True
It seems that there were various types such as string type in Python2, At least the Python 3 characters seem to be unified with this.
The file with the Python code is written in ** UTF-8 **.
Create the following file as a trial
test_python_file.py
test = "Hello World"
When I try to output with Windows Console (Shift-jis)
console(Windows)
>type test_python_file.py
test = "縺 薙 s 縺 縺."
The .py file is written in ** UTF-8 **, so of course it gets garbled.
This is a bit complicated.
It seems that it depends on the environment (OS) you are using.
Windows -> Shift-jis Mac, Linux -> UTF-8
Overall, I would like to say the following.
environment | Character code |
---|---|
Python 3 characters | Unicode |
.py file code | UTF-8 |
Output file etc. | Windows : Shift-jis, Mac&Linux : UTF-8 (Environment dependent) |
If you know this internal structure, it will be easier to deal with character code errors.
If you make a mistake, please let me know!
Recommended Posts