Python string

Python string

Python uses a mechanism called codecs to convert multibyte characters into various encodings. It seems that it supports not only Japanese but also Korean and Chinese.

Expressed as data of 2 bytes or more. Characters that cannot be expressed in 1 byte

[a] Can be expressed in 1 byte [A] Cannot be expressed in 1 byte

I examined the representative ones and summarized them below

ASCII A character code that summarizes alphabets, numbers, symbols, etc. It is widely used worldwide as the most basic character code, and many other character codes are implemented to be extensions of ASCII. Characters are represented by 7-bit values (0 to 127), and 128 characters are recorded. "A" is 0x41 in ASCII (0x represents hexadecimal).

Since it was difficult to imagine, I have extracted the lowercase letters and alphabetic parts below.

Hexadecimal letter
0x61 a
0x62 b
0x63 c
0x64 d
0x65 e
0x66 f
0x67 g
0x68 h
0x69 i
0x6a j
0x6b k
0x6c l
0x6d m
0x6e n
0x6f o
0x70 p
0x71 q
0x72 r
0x73 s
0x74 t
0x75 u
0x76 v
0x77 w
0x78 x
0x79 y
0x7a z

Shift_JIS It is a character code that is often used to represent Japanese, which summarizes various characters including Japanese standardized by the Japanese Industrial Standards Committee. All characters are represented by 2 bytes. "A" is 0x82E0 in Shift_JIS.   UTF-8 This is the most widely used standard character code today. All characters are represented by 1 to 4 bytes. Since it can handle characters from all over the world, it has come to be used as standard. The same part as ASCII is represented by 1 byte, and the other parts are represented by 2 to 6 bytes, which is a variable length encoding method. UTF-8 is highly compatible with ASCII code and is used by many software around the world. "A" is 0xe38182 in UTF-8. In Python version 2.x, the standard character code was ASCII. In Python version 3.x, the standard character code is UTF-8, so you can handle Japanese without declaring the character code.

Unicode A character code standardized by the International Organization for Standardization (ISO) as part of ISO / IEC 10646. The purpose is that it was created with the aim of encoding that can be used in common in all countries.

To convert a string to byte type

encode() Description method 'Character string'.encode ('Character code name') * Character code = "utf-8" etc.

decode() Description method b'byte string'.decode ('character code name')

Recommended Posts

Python string
Python: String concatenation
python string slice
Python2 string type
Python string format
Python # string type
Python string inversion
Python
String manipulation in python
[Python] Multi-line string assignment
Python string manipulation master
[Python2] Date string-> UnixTime-> Date string
Random string generation (Python)
Python3> documentation string / docstring
Python string processing illustration
[python] Convert date to string
Python indentation and string format
[Python] Use a string sequence
Various Python built-in string operations
[Python 2/3] Parse the format string
About Python string comparison operators
String date manipulation in Python
Python f character (formatted string)
String format with Python% operator
kafka python
Python basics ⑤
python + lottery 6
Python Summary
Built-in python
Python technique
Studying python
Python 2.7 Countdown
Python memorandum
python tips
python function ①
Python basics
Python memo
ufo-> python (3)
install python
Python Singleton
Python basics ④
Python Memorandum 2
python memo
Python Jinja2
Python increment
atCoder 173 Python
[Python] function
Python installation
python tips
Installing Python 3.4.3.
Try python
Python iterative
Python algorithm
Python2 + word2vec
[Python] Variables
Python functions
String format 2
Python sys.intern ()
Python tutorial
Python decimals
python underscore