[PYTHON] How to utilize multi-core from multiple languages

How to utilize multi-core from multiple languages

Introduction

In recent years, multi-core with multiple cores mounted on one CPU has become common. However, with the current programming language, it is difficult for engineers to create multi-core programs without being aware of it. Therefore, I will explain how to utilize multi-core from various languages.

Processes and threads

A process is a running program such as each application, and a thread is a unit of CPU usage. The process has one or more threads as follows, and can process as many threads as the number of CPU cores. (In recent years, [SMT](https://ja.wikipedia.org/wiki/%E5%90%8C%E6%99%82%E3%83%9E%E3%83%AB%E3%83% 81% E3% 82% B9% E3% 83% AC% E3% 83% 83% E3% 83% 87% E3% 82% A3% E3% 83% B3% E3% 82% B0) One physics The core can handle multiple threads, such as 2 threads. It's like 2 cores and 4 threads.) スレッドとプロセス.png スレッドとプロセスとCPU.png In order to effectively utilize multi-core and execute a program, it is necessary to generate an appropriate number of threads on the program side for the number of cores that the CPU can process. It is possible to create more threads than the number of cores, but the CPU can only process as many threads as there are cores, and there is a problem that processing slows down due to switching threads to execute.

Parallel and parallel

There are similar terms, parallel and concurrent, but they are different. Parallel is a case where multiple threads are processed by multiple cores by performing multiple processes at the same time. (Multiple processes cannot be executed at the same time with a single core, so parallel cannot be achieved.) 並列.png Concurrent means that multiple processes are switched and executed at the same time, so that one thread can switch multiple processes and execute them. 並行.png Since it is possible to execute while switching the processing in multiple threads, it is also possible to realize parallel and parallel.

C10K problem

Apache, a web server, uses a method to generate a process for each user request, and when the number of clients reaches about 10,000, the response performance drops significantly even though the hardware performance of the web server has a margin. There was a C10K problem. (The specific cause of the C10K problem was easy to understand in this article.) Therefore, in nginx and Node.js, I tried to solve the C10K problem by processing in parallel by processing asynchronous I / O in a single thread.

Node.js As mentioned above, Node.js operates in a single thread, and the approach is to process in parallel with asynchronous processing such as async / await. As shown in the figure below, the image is such that when you access the external API, other processing is performed until the result is returned, and when the result is obtained, the processing is continued. (For details, this article was easy to understand.) 非同期.png Therefore, when performing standard asynchronous processing, it is not possible to bring out the performance of multi-core. Therefore, in Node.js, use Cluster to create multiple processes (https://postd.cc/setting-up-a-node-js). You need to either -cluster /) or create multiple threads using worker_threads. In order to utilize the multi-core core in this way, it is necessary to create multiple processes or threads from the program side, and variable values can be shared in multi-thread, but the memory space is separated in multi-process. [Advantages and disadvantages] of not being able to share variable values (https://stackoverflow.com/questions/56656498/how-is-cluster-and-worker-threads-work-in-node-js) Exists.

GIL that happens in Ruby and Python

In Node.js, I was able to take advantage of multi-core by creating multiple processes or threads. However, in Ruby and Python, [Global Interpreter Lock (GIL)](https://ja.wikipedia.org/wiki/%E3%82%B0%E3%83%AD%E3%83%BC%E3%83% 90% E3% 83% AB% E3% 82% A4% E3% 83% B3% E3% 82% BF% E3% 83% 97% E3% 83% AA% E3% 82% BF% E3% 83% AD% There is something called E3% 83% 83% E3% 82% AF), and even if you create multiple threads, they cannot be executed in parallel. (To be exact, it is the case of CPython and CRuby implemented in C language, but it is omitted here.) Therefore, if you try to utilize multi-core in these languages, it cannot be realized by multi-threading, and you need to create multiple processes.

Goroutine in Go language

In Go language, asynchronous processing is realized in parallel and parallel using something called goroutine, and by default, GOMAXPROCS with the number of CPU cores is set. As many threads as this value are prepared, and the lightweight thread goroutine is executed in the threads. The figure below is an image when the number of CPU cores is 4 and GOMAX PROCS = 4. goroutine.png By using goroutine in this way, you can execute programs in parallel and in parallel by taking advantage of multi-core. (It was easy to understand why goroutine is lightweight this article.)

Async / await in Rust

In Rust, asynchronous processing can be performed using async / await. At this time, you can select the execution allocation method for asynchronous processing threads depending on which runtime is used. A popular runtime is tokio. In tokio, threads are created for the number of cores, and asynchronous processing is passed to those threads, which is similar to goroutine's way of utilizing multi-core. (For other allocation methods and asynchronous processing in Rust, this article was easy to understand. Especially [About the execution model here](https://tech-blog.optim.co.jp/entry/2019/11/08/163000#%E5%AE%9F%E8%A1%8C%E3%83% A2% E3% 83% 87% E3% 83% AB) is easy to understand.)

finally

In Ruby and Python, it is difficult to make it multi-threaded due to the mechanism, and the asynchronous processing of Node.js could not make use of multi-core as it is. However, in the Go language and Rust, which are popular in recent years, by calling asynchronous processing, parallel and parallel processing can be performed without the engineer being aware of it, and multi-core can be utilized. It's no wonder that Go and Rust are popular in the modern era when multi-core CPUs are becoming more commonplace.

reference

[Illustration] Differences between CPU cores, threads, and processes, context switching, and multithreading Getting closer to a complete understanding of Unity asynchronous by knowing the difference between processes, threads and tasks I investigated asynchronous I / O of Node.js Node.js I can't hear anymore I searched as much as possible about the GIL that you should know if you want to do parallel processing with Python Why goroutine is lightweight [Master Rust Asynchronous Programming](https://tech-blog.optim.co.jp/entry/2019/11/08/163000#%E3%83%A9%E3%83%B3%E3%82 % BF% E3% 82% A4% E3% 83% A0% E3% 81% A7% E9% 9D% 9E% E5% 90% 8C% E6% 9C% 9F% E3% 82% BF% E3% 82% B9 % E3% 82% AF% E3% 82% 92% E8% B5% B7% E5% 8B% 95% E3% 81% 99% E3% 82% 8B)

Recommended Posts

How to utilize multi-core from multiple languages
[Pepper] How to utilize it?
How to slice a block multiple array from a multiple array in Python
How to use SWIG from waf
How to launch Explorer from WSL
How to access wikipedia from python
How to convert from .mgz to .nii.gz
How to create a clone from Github
How to title multiple figures with matplotlib
How to easily convert format from Markdown
How to update Google Sheets from Python
[TF] How to use Tensorboard from Keras
How to access RDS from Lambda (python)
How to operate Linux from the console
How to create a repository from media
How to access the Datastore from the outside
How to assign multiple values to the Matplotlib colorbar
I want to connect to PostgreSQL from various languages
How to open a web browser from python
How to do multi-core parallel processing with python
How to create a function object from a string
How to get results from id in Celery
[Python] How to read data from CIFAR-10 and CIFAR-100
How to generate a Python object from JSON
Sum from 1 to 10
How to call Cloud API from GCP Cloud Functions
How to operate Linux from the outside Procedure
How to handle Linux commands well from Python
How to extract coefficients from a fractional formula
How to measure line speed from the terminal
How to Git from GCP's Jupyter Lab to GSR
[Python Tips] How to retrieve multiple keys with the maximum value from the dictionary
Learn how to inflate images from TensorFlow code
[Java] How to switch between multiple versions of Java
How to clone github remote repository from atom
How to return multiple indexes with index method
[Python] How to remove duplicate values from the list
How to create an article from the command line
How to "cache enabled" access to FPGA memory from Linux
How to get multiple model objects randomly in Django
How to use Keras ~ From simple model generation to CNN ~
How to write string concatenation in multiple lines in Python
How to scrape image data from flickr with python
How to use Azure Table storage from Django (PTVS)
Backtrader How to import an indicator from another file
How to display multiple images of galaxies in tiles
How to instantly launch Jupyter Notebook from the terminal
How to download files from Selenium in Python in Chrome
How to change static directory from default in Flask
Execute Python function from Powershell (how to pass arguments)
How to retrieve multiple arrays using slice in python.
How to post a ticket from the Shogun API
How to take a captured image from a video (OpenCV)
[Python] How to call a c function from python (ctypes)
How to create a kubernetes pod from python code
Summary of how to share state with multiple functions
How to use xml.etree.ElementTree
How to use Python-shell
How to use tf.data
How to use virtualenv
Scraping 2 How to scrape