[Python] Summarize the rudimentary things about multithreading

Python multithreading

In this article, I will summarize what I have learned about multithreading and describe it in order to deepen my understanding.

About multithreading

[Multithreading is the parallel flow of multiple processes when executing a single computer program. Also, such multiple processing flows. ](Http://e-words.jp/w/%E3%83%9E%E3%83%AB%E3%83%81%E3%82%B9%E3%83%AC%E3%83%83 % E3% 83% 89.html)

If you divide a program into threads, you can execute them in parallel while sharing the memory context. If no external resources are used, the speed will not increase even if multithreading is performed on a single core CPU. Multi-threading on a multi-core CPU improves the speed of the program by assigning each thread to a separate CPU and executing it in parallel at the same time.

Comparison of threads and processes

The features are summarized from the viewpoint of simple definition, memory space, and context switch.

Definition

Memory space

Context switch

About context switching

[A context switch is to suspend the process flow (process, thread) currently being executed by the computer's processing unit (CPU), switch to another one, and resume execution. ](Http://e-words.jp/w/%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88 % E3% 82% B9% E3% 82% A4% E3% 83% 83% E3% 83% 81.html)

The context switch of the process needs to switch the memory address space, and this operation is a relatively expensive operation. The following materials were helpful https://code-examples.net/ja/q/530280 https://www.slideshare.net/ssuserc2d4c1/ss-124497965

As a result, the following features exist for each from the viewpoint of efficiency and reliability.

Efficiency

Compared to parallel processing by multiple processes, multithreading is more efficient because it generally shares memory space.

reliability

Since multithreading shares memory space, when certain data is used from parallel processing, it is necessary to protect the data from the processing being accessed. If multiple threads try to update one unprotected data at the same time, they will get into a race condition and an unexpected error will occur. You need to lock it to protect your data. It is difficult to use it properly to lock data.

On the other hand, since multi-process does not share memory space, the possibility of data corruption and deadlock that can occur in multi-thread is reduced.

Global interpreter lock (GIL)

[Global Interpreter Lock (GIL) is an exclusive lock to prevent the non-thread-safe code held by the thread of the interpreter in the programming language from being shared with other threads. ](Https://ja.wikipedia.org/wiki/Global interpreter lock)

The global interpreter lock (hereinafter abbreviated as "GIL") that exists in Ruby and Python is adopted. In Python, the number of threads that access Python objects is always limited to one thread. Why is this? First, the implementation of Python written in C (CPython) is not thread-safe. The situation where it is not thread-safe refers to the situation where data is corrupted when multiple threads execute at the same time or handle the same data. The data mentioned here is, for example, "the contents of the shared memory area". As a means to avoid data corruption caused by not being thread-safe, there is a means to prevent sharing with other threads. In order to prevent sharing with other threads, it is necessary to adopt an exclusive lock mechanism. This exclusive lock is called GIL. Therefore, the GIL always limits the number of threads to one.

The following materials were very helpful http://blog.bonprosoft.com/1632 https://methane.hatenablog.jp/entry/20111203/1322900647

[Python official documentation] about GIL (https://docs.python.org/ja/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock) Mention

There are two ways to master Python on a multi-CPU machine:

Use of multithreading with GIL restrictions

Scenes where you want to create a responsive interface

Consider a system that copies files from one directory to another by GUI operation. Multithreading is used as a requirement, copy processing is executed in the background, and the GUI window is constantly updated by the main thread. As a result, the progress of execution or operation is fed back to the user in real time, and the work can be interrupted. Creating an interface based on the responsiveness here means processing time-consuming tasks in the background and giving feedback to the user within a certain period of time. There is the use of multithreading as a method of realizing this. (Not for the purpose of improving performance, but for allowing the user to operate the interface even when data processing takes a long time)

When a process depends on external resources

If the process depends on external resources, it may be possible to speed up by multithreading. When sending a large number of HTTP requests to an external service, multithreading is often used. It takes time to receive the response If you want to get multiple results from Web API, it takes time to execute them synchronously. When communicating with WebAPI, parallel requests (requests when multiple requests can be executed completely or partially out of order) are processed in parallel with almost no effect on response time. There is. As a means of realizing this parallel processing, multiple requests may be executed separately as threads. When executing an HTTP request, it often takes time to read from the TCP socket (recv ()). In CPython, executing the C language recv () function releases the GIL. (This seems to be due to blocking I / O processing, but I still don't understand.) Multithreading can be used by releasing the GIL.

Impressions

I wonder if threads are useful for waiting for I / O processing in Python. CPython is still difficult for me.

References

http://ossforum.jp/node/579 https://ja.wikipedia.org/wiki/グローバルインタプリタロック http://blog.bonprosoft.com/1632 https://methane.hatenablog.jp/entry/20111203/1322900647 http://e-words.jp/w/%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%82%B9%E3%82%A4%E3%83%83%E3%83%81.html http://e-words.jp/w/%E3%83%9E%E3%83%AB%E3%83%81%E3%82%B9%E3%83%AC%E3%83%83%E3%83%89.html Mastering TCP / IP Primer 5th Edition Expert Python Programming Revised 2nd Edition

Recommended Posts

[Python] Summarize the rudimentary things about multithreading
About the Python module venv
About the ease of Python
About the enumerate function (python)
About the features of Python
[Python] What is @? (About the decorator)
About the basics list of Python basics
[Python Kivy] About changing the design theme
About the virtual environment of python version 3.7
Python amateurs try to summarize the list ①
Let's summarize the Python coding standard PEP8 (2)
About python slices
About the test
About python yield
A Java programmer studied Python. (About the decorator)
About python, class
About the difference between "==" and "is" in python
A note about the python version of python virtualenv
About python, range ()
About python decorators
A memorandum about the Python tesseract wrapper library
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
A python amateur tries to summarize the list ②
About python reference
About Python decorators
[Python] About multi-process
About the * (asterisk) argument of python (and itertools.starmap)
[Python] Seriously think about the M-1 winning method.
Two things I was happy about with Python 3.9
About the queue
Summarize Python import
Think about how to program Python on the iPad
Sort in Python. Next, let's think about the algorithm.
Receive the form in Python and do various things
[AWS IoT] Register things in AWS IoT using the AWS IoT Python SDK
(◎◎) {Let's let Python do the boring things) ......... (Hey? Let's let Python do the homework} (゜) (゜)
About the --enable-shared option when building Python on Linux
A reminder about the implementation of recommendations in Python
python memo (for myself): About the development environment virtualenv
I tried to summarize the string operations of Python
About Python for loops
Find the maximum Python
Summary about Python scraping
About function arguments (python)
[Python] Memo about functions
Summary about Python3 + OpenCV3
the zen of Python
About Python, for ~ (range)
About Python3 character code
[Python] Memo about errors
About Python development environment
About the Unfold function
Python: About function arguments
About the service command
Python, about exception handling
About Python Pyramid traversal
[Python] Split the date
About the confusion matrix
About the Visitor pattern
About Python3 ... (Ellipsis object)