About Python3 character code

I thought I'd write something like a memorandum of information, so I posted it.

… But I started doing other work in the middle of writing, so I posted an unfinished product for the time being. I wonder if I should add code or something when I can afford it ...

Introduction

This time, I will summarize the character code of Python3.

Recently, I brought in data from other sites and tried to format it, Error or garbled characters when bringing in a request or writing a file.

I didn't understand even if I caught various sites, so I asked a detailed person. I think it's refreshing, so I'll write it here.

What is a character code?

First of all, I think that some people wonder what the character code is, so I will write it briefly.

To put it simply

One binary number is assigned to each character

I think….

One serial number is assigned to a certain character, The serial number determines which character to convert.

What is the character code of Python3 ??

So what is the character code of Python3 used?

It looks like this when the whole is illustrated. picture_for_qittq_strcode.jpg

I will explain this one by one.

About Python 3 characters

The characters stored in variables in Python3 are ** Unicode **.

PythonConsole


>>>'Ah' == u'\u3042'
True

It seems that there were various types such as string type in Python2, At least the Python 3 characters seem to be unified with this.

Character code of .py file

The file with the Python code is written in ** UTF-8 **.

Create the following file as a trial

test_python_file.py


test = "Hello World"

When I try to output with Windows Console (Shift-jis)

console(Windows)


>type test_python_file.py
test = "縺 薙 s 縺 縺."

The .py file is written in ** UTF-8 **, so of course it gets garbled.

.Txt files created by Python programs, etc.

This is a bit complicated.

It seems that it depends on the environment (OS) you are using.

Windows -> Shift-jis Mac, Linux -> UTF-8

Summary

Overall, I would like to say the following.

environment Character code
Python 3 characters Unicode
.py file code UTF-8
Output file etc. Windows : Shift-jis, Mac&Linux : UTF-8 (Environment dependent)

If you know this internal structure, it will be easier to deal with character code errors.

If you make a mistake, please let me know!

Recommended Posts

About Python3 character code
python character code
Character code
2.x, 3.x character code of python
Character code learned in Python
About python slices
About python comprehension
About Python tqdm.
About python yield
About python, class
About python inheritance
About python, range ()
About python decorators
[Python] Algorithm-aware code
About python reference
About Python decorators
[Python] About multi-process
[Python] Get the character code of the file
About function arguments (python)
Python Node.js character manipulation
Python code acceleration approach
Rewrite Python2 code to Python3 (2to3)
infomap python draw code
[Python] Memo about functions
Before writing Python code
Summary about Python3 + OpenCV3
[Python] [chardet] Automatic detection of character code of file
About Python, for ~ (range)
[Python] Memo about errors
About Python development environment
Python: About function arguments
Python, about exception handling
Links and memos of Python character code strings
About Python Pyramid traversal
Convert the character code of the file with Python3
About Python3 ... (Ellipsis object)
Python Requests status code
[Python] Chapter 01-01 About Python (First Python)
[Python] About standard input
About __all__ in python
OpenCV basic code (python)
python2 series / 3 series, character code and print statement / command line
[Python] Find out about pip
Manipulating Python character lists (arrays)
Get country code with python
Python with VS Code (Windows 10)
3-3, Python strings and character codes
Think about architecture in python
About python beginner's memorandum function
About the ease of Python
About the enumerate function (python)
Source code character code check script
About various encodings of Python 3
[Python] Frequently used library code
Debug Python with VS Code
About Python, len () and randint ()
About Perl, Python, PHP, Ruby
About Python datetime and timezone
A memorandum about correlation [Python]
Stop Omxplayer from Python code
Python frequently used code snippets