[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)

7.1.1.2 UTF-8 encoding and decoding

Good for exchanging data with the outside world. The following two means are required.

--Means to encode a string into a byte string --Means to decode a byte string into a string --The dynamic encoding UTF-8 is the standard encoding for Python, Linux, and HTML. --When you copy and paste other sources from a web page etc. to create a Python string, you have to make sure that the sources are encoded in UTF-8 format. (An exception occurs.)

7.1.1.3 Encoding

--The sender of the information converts the content of the message to be conveyed into the form of a symbol that reaches the receiver. --Encode the string into bytes. -** UTF-8 ** is ** 8-bit variable length encoding **.


#String encode()The first argument of the function is the encoding name.
#Unicode string"\u2603"Substitute
>>> snowman="\u2603"
>>> len(snowman)
1

#Encode this Unicode character into a byte sequence.
>>> ds=snowman.encode("utf-8")
>>> len(ds)
3
>>> ds
b'\xe2\x98\x83'

-** The above snowman is expressed as \ u2603 in the Python world, but if you convert it to utf-8, which is the standard language in the outside world. It becomes b'\ xe2 \ x98 \ x83'. ** **


#An error will occur unless the Unicode character is also a valid ASCII character.
>>> ds=snowman.encode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can not encode character '\u2603' in position 0: ordinal not in range(128)

#encode()The function has a second argument to make it less likely to cause an encoding exception. The default value works as before"strict"So, if non-ASCII characters are used, UnicodeEncodeError will occur.

#Discard unencoded items by specifying ignore
>>> snowman.encode("ascii","ignore")
b``
#Characters that cannot be encoded using replace?Replace with
>>> snowman.encode("ascii","replace")
b`?`
#unicode-Generate a Python Unicode string in escape format.
>>> snowman.encode("ascii","backslashreplace")
b`\\u2603`
#Generate a string of entities that can be used on web pages.
>>> snowman.encode("ascii","xmlcharrefreplace")
b`&#9731;`

7.1.1.4 Decoding

--This refers to the process in which the receiving side interprets the meaning of the symbols created by the sending side of the information. --Decode the byte string to a Unicode string. -** Extract text from some external source (files, websites, network APIs, etc.) and convert it to Unicode in the Python world. ** The text is encoded as a byte string.

#value`café`Create a Unicode string for
>>> place = "caf\u00e9"
>>> place
`café`
>>> type(place)
<class `str`>
#UTF-Encode in 8 formats and place_Assign to the bytes variable.
>>> place_bytes=place.encode("utf-8")
#place_Note that bytes is 5 bytes
#The first 3 bytes are the same as ASCII(UTF-8 advantages)Each character is encoded in 1 byte, and the last é is encoded in 2 bytes.
>>> place_bytes
b`caf\xc3\xa9`
>>> type(place_bytes)
<class `bytes`>

>>> place2=place_bytes.decode("utf-8")
>>> place2
`café`
#The ASCII decoder returns an error because the byte value 0xc3 is invalid in ASCII.
>>> place3=place_bytes.decode("ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: `ascii` codec can`t decode byte 0xc3 in position 3: ordinal not in range(128)

#UTF as much as possible-It is preferable to use 8 encodings.
>>> place4=place_bytes.decode("latin-1")
>>> place4
`café`
>>> place5=place_bytes.decode("windows-1252")
>>> place5
`café`

coffee break "About numbers and letters"

--Computers perform calculations and judgments in binary. -** Bit ** is one digit of binary number. -** Hexagon ** is a counting method that expresses 4 binary digits with 1 digit of "0 to F" to make it easier for humans to understand. -** Bytes ** is a unit of 2 hexadecimal digits (8 bits = 0 to 255 in decimal). -** What is binary **? The characters actually displayed on the display are displayed by converting the character code into a character image by the OS of the personal computer. --Image files are usually compressed in "GIF" or "JPG" format.

Example: Contents of JPG file (byte characters)

FFD8FFE0 00104A46 49460001 0101004B 004B0000 FFFE0094 56542D43 6F6D7072 65737320 28746D29 2058696E 67205465 63686E6F 6C6F6779 20436F72 702E0000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000FFDB 00840010 0B0C0E0C 0A100E0D 0E121110 131828

Impressions

When I tried Chapter 7, it became a story of data, and binary numbers, hexadecimal numbers, entanglement with bytes, etc. came out at once. I studied when I was a student, but I have forgotten it, so I will review it little by little in this chapter.

References

"Introduction to Python3 by Bill Lubanovic (published by O'Reilly Japan)"

Reference URL http://zaq.g1.xrea.com/2sinsuu5.htm https://docs.python.org/ja/3/howto/unicode.html

Recommended Posts

[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
[Introduction to Python3 Day 3] Chapter 2 Py components: Numbers, strings, variables (2.2-2.3.6)
[Introduction to Python3 Day 2] Chapter 2 Py Components: Numbers, Strings, Variables (2.1)
[Introduction to Python3 Day 4] Chapter 2 Py Components: Numbers, Strings, Variables (2.3.7-2.4)
[Introduction to Python3 Day 22] Chapter 11 Concurrency and Networking (11.1 to 11.3)
[Introduction to Python3 Day 11] Chapter 6 Objects and Classes (6.1-6.2)
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)
[Introduction to Python3 Day 8] Chapter 4 Py Skin: Code Structure (4.1-4.13)
[Introduction to Python3 Day 1] Programming and Python
[Introduction to Udemy Python3 + Application] 11. Character strings
Introduction to Effectiveness Verification Chapter 1 in Python
Introduction to Python language
Introduction to OpenCV (python)-(2)
[Introduction to Python3 Day 7] Chapter 3 Py Tools: Lists, Tuples, Dictionaries, Sets (3.3-3.8)
[Introduction to Python3 Day 10] Chapter 5 Py's Cosmetic Box: Modules, Packages, Programs (5.4-5.7)
[Introduction to Python3 Day 9] Chapter 5 Py's Cosmetic Box: Modules, Packages, Programs (5.1-5.4)
[Introduction to Python3 Day 6] Chapter 3 Py tool lists, tuples, dictionaries, sets (3.2.7-3.2.19)
Introduction to effectiveness verification Chapter 3 written in Python
Introduction to Effectiveness Verification Chapter 2 Written in Python
Introduction to Python Django (2) Win
Introduction to serial communication [Python]
[Introduction to Python] <list> [edit: 2020/02/22]
Introduction to Python (Python version APG4b)
An introduction to Python Programming
Introduction to Python For, While
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Introduction to Udemy Python 3 + Application] 58. Lambda
[Introduction to Udemy Python 3 + Application] 31. Comments
Introduction to Python Numerical Library NumPy
Practice! !! Introduction to Python (Type Hints)
[Introduction to Udemy Python 3 + Application] 57. Decorator
[Introduction to Python] How to parse JSON
[Introduction to Udemy Python 3 + Application] 56. Closure
Introduction to Protobuf-c (C language ⇔ Python)
[Introduction to Udemy Python3 + Application] 59. Generator
[Introduction to Python] Let's use pandas
[Introduction to Python] Let's use pandas
[Introduction to Udemy Python 3 + Application] Summary
Python day 1
Introduction to image analysis opencv python
[Introduction to Python] Let's use pandas
An introduction to Python for non-engineers
Introduction to Python Django (2) Mac Edition
[AWS SAM] Introduction to Python version
[Python Tutorial] An Easy Introduction to Python
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
[Introduction to Udemy Python3 + Application] 12. Indexing and slicing of character strings