3-3, Python strings and character codes

Character code

The character string is Character code, It becomes a special data type.

Since computers can only process numbers, text-formatted characters must be converted to numbers. When the computer was first designed, 1 byte became 8 bits. The maximum integer that can be displayed in 1 byte is 255 (decimal number 11111111 is decimal number 255), and a lot of bytes are required to display a larger integer. For example, the maximum integer that can be displayed in 2 bytes is 65535, and 4 bytes is 4294967295.

Computer development is progressing mainly in the United States, and there are only 128 character codes that were initially standardized. This character code is called ASCII and can handle all alphanumeric characters, symbols, spaces, line breaks, etc. used in English. Example: The character code of ʻA is 65, and the character code of z is 122`.

Of course, 1 byte is not enough to process Japanese, so we have to do 2 bytes. However, since the same as ASCII cannot be used, Japan created a JIS code. In other countries, character codes for each country were created based on ASCII. As a result, garbled characters were displayed for text files in which multiple languages were mixed.

Unicode was born to deal with garbled characters. All languages have been unified into one character code (Unicode). Unicode is still being improved, but all commonly used characters are processed in 2 bytes.

The ASCII of the string ʻA is the decimal number 65and the binary number01000001. The ASCII of the string 0 is the decimal number 48and the binary number00110000. (Note: the string 0 and the number 0 are not the same.) Since ASCII ʻA is displayed in Unicode, 0 should be entered before it. The Unicode character code of ʻA is 00000000 01000001`. Here comes a new problem. When Unicode is used, the garbled characters disappear, but the amount of data is double that of ASCII for all English text data. To solve this, we created variable-length UTF-8. In UTF-8, it was converted from 1 byte to 6 bytes depending on the size of Unicode of one character. Alphabets are 1 byte, ordinary kanji are 1 to 3 bytes, and rarely used kanji are 4 to 6 bytes.

E697A 5th 110010111100101 11100110 10010111 10100101

Table of contents IT memos for non-IT industries

reference: ・ Character code course 1st-History of character code (pre-Unicode history) -Handling of character codes in Python

Recommended Posts

3-3, Python strings and character codes
Links and memos of Python character code strings
[Python] Extract only numbers from lists and character strings
[Beginner] Extract character strings with Python
python character code
[Introduction to Udemy Python3 + Application] 11. Character strings
[Introduction to Udemy Python3 + Application] 12. Indexing and slicing of character strings
[Python] Various combinations of strings and values
[Python] Summary of conversion between character strings and numerical values (ascii code)
[Python] Chapter 02-02 Basics of Python programs (Handling of character strings)
[python] Compress and decompress
Python and numpy tips
[Python] pip and wheel
Python Node.js character manipulation
Batch design and python
Python iterators and generators
Python packages and modules
Vue-Cli and Python integration
Ruby, Python and map
About Python3 character code
python input and output
Python and Ruby split
Memorize Python commentary 2 --Strings
Compare strings in Python
Reverse strings in Python
Python3, venv and Ansible
Python asyncio and ContextVar
Replace and delete strings
python2 series / 3 series, character code and print statement / command line
Manipulating Python character lists (arrays)
Programming with Python and Tkinter
[Python3] Be careful with removing character strings (strip, lstrip, rstrip)
Python: Class and instance variables
Display character strings without line breaks in python (personal memo)
Python 2 series and 3 series (Anaconda edition)
Python and hardware-Using RS232C with Python-
Python on Ruby and angry Ruby on Python
Python indentation and string format
[Python] How to make a list of character strings character by character
Python real division (/) and integer division (//)
Install Python and Flask (Windows 10)
Remove double-byte spaces before and after the character string (python)
About python objects and classes
About Python variables and objects
Apache mod_auth_tkt and Python AuthTkt
Å (Ongustromu) and NFC @ Python
Understand Python packages and modules
# 2 [python3] Separation and comment out
Search for strings in Python
Python shallow copy and deep copy
Python and ruby slice memo
Python installation and basic grammar
I compared Java and Python!
Python shallow and deep copy
Get a Python web page, character encode it, and display it
Date and time ⇔ character string
About Python, len () and randint ()
About Python datetime and timezone
Install Python 3.7 and Django 3.0 (CentOS)
2.x, 3.x character code of python
Web crawling, web scraping, character acquisition and image saving with python