[Python] About multi-process

About this article

It's an article that summarizes what you've learned about Python multi-processing.

About multi-process in Python

When is multi-process used? ⇒ When realizing parallel processing, it is possible to divide the process as a means of realization.

Applications that execute CPU-intensive tasks on a multi-core CPU currently require the use of multi-processes to take advantage of the multi-core CPU.

https://docs.python.org/ja/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock

Benefits of multi-processing in Python

To start a process

Before mentioning the source code using multi-process, I will mention how to start a new process. In any programming language, the way to start a new process is to fork a program. In Python, by executing ʻos.fork ()`, each process will run in a different address space after the memory context has copied the child processes. Below, the source.

fork.py


import os

pid_list = []

def main():
    pid_list.append(os.getpid())
    child_pid = os.fork()

    if child_pid == 0:
        pid_list.append(os.getpid())
        print()
        print("Child: こんにちは,私はChildプロセスです")
        print("Child:The PID number I know is%s" % pid_list)
    
    else:
        pid_list.append(os.getpid())
        print()
        print("parent:こんにちは,私はparentプロセスです")
        print("parent:The PID number of the child process is%d"%child_pid)
        print("parent:The PID number I know is%s"%pid_list)

if __name__ == "__main__":
    main()

$python fork.py

parent:こんにちは,私はparentプロセスです
parent:The PID number of the child process is 321
parent:The PID number I know is[320, 320]is

Child: こんにちは,私はChildプロセスです
Child:The PID number I know is[320, 321]is

The initial process has the same 320 PID, but you can see that the child process has added 321 and that the two processes do not share a memory context.

Implementation of interprocess communication

Process memory is not shared by default. If you want to communicate between processes, you need to do some work. To make this easier, the multiprocessing module provides several ways to communicate between processes. The following two methods are introduced here.

About multiprocessing.Pipe

The Pipe class has a similar concept to Unix and Linux pipes. multiprocessing.Pipe () returns a pair of Connection objects that represent both ends of the pipe. In the example below (pipesample.py), parent_conn, child_conn = Pipe () is applicable. The default Pipe (True) makes it bidirectional. With Pipe (False), the pipe is unidirectional, and withconn1, conn2 = Pipe (), conn1 is dedicated to receiving messages and conn2 is dedicated to sending. The Pipe class also sends and receives pickleable objects.

Reference URL: https://docs.python.org/ja/2.7/library/multiprocessing.html#pipes-and-queues

pipesample.py


from multiprocessing import Process, Pipe

class CustomClass:
    pass

def work(connection):
    while True:
        instance = connection.recv()

        if instance:
            print("Child:Receive:{}".format(instance))

        else:
            return

def main():
    parent_conn, child_conn = Pipe()

    child = Process(target=work, args=(child_conn,))

    for item in (
        42,
        'some string',
        {'one':1},
        CustomClass(),
        None,
    ):
        print("parent:Send:{}".format(item))
        parent_conn.send(item)
    
    child.start()
    child.join()

if __name__ == "__main__":
    main()
$python pipesample.py
parent:Send:42
parent:Send:some string
parent:Send:{'one': 1}
parent:Send:<__main__.CustomClass object at 0x7fc785a34ac8>
parent:Send:None
Child:Receive:42
Child:Receive:some string
Child:Receive:{'one': 1}
Child:Receive:<__main__.CustomClass object at 0x7fc785268978>

If you pass the instance created by for item in (42, ..., None,): to the argument of parent.send (), the process that is paired by receiving child .recv () The state of the data is passed to. You can also see that the process addresses are different.

Implementation using multiprocessing.sharedctypes

In the multiprocessing.sharedctypes class, a shared memory is created and data types (int type, double type, etc.) are created there. Provides a way to insert. The data type follows C type. The most basic ones are Value (typecode_or_type, * arg, lock = True) and ʻArray (typecode_or_type, size_or_initializer, *, lock = True). typecode_or_typedetermines the type of object returned. It is either a ctypes type or a one-letter type code as used in the array module. Since it is difficult to describe list, dictionary, Namespace, Lock, etc., usemultiprocessing.Manager` in that case. Reference: https://docs.python.org/ja/3/library/multiprocessing.html#sharing-state-between-processes

valuearray.py


from multiprocessing import Process, Value, Array

def f(n,a):
    n.value = 3.141592
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == "__main__":
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])
$python valuearray.py
3.141592
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

Use process pool

Using multi-process instead of threads adds a lot of overhead. Memory usage increases, especially if each process has an independent memory context. As a result, when a large number of child processes are generated, the harmful effects are greater than processing using threads. In multi-process applications, building a process pool is a good way to control resource utilization. The basic idea of a process pool is to prepare a process specified in advance, and then take items from the queue and process them. Instead of starting the process after the task to be processed arrives, start the process in advance so that the process starts immediately after the task is assigned.

About the Pool class

This class takes care of all the complex processes that manage multiple processes.

The following source code uses the Google Map API of GCP (Google Cloud Platform) to obtain the latitude and longitude that hit the city name. By setting POOL_SIZE = 4, four processes that operate in parallel are specified. The Pool class can also use the context manager.

geocoding_by_multiprocessing.py



from multiprocessing import Pool

from gmaps import Geocoding

api = Geocoding(api_key='secret')

PLACES = (
    'Reykjavik','Vien','Zadar',
    'Venice','Wrocow','Bolognia',
    'Berlin','Dehil','New York',
    'Osaka'
)

POOL_SIZE = 4

def fetch_place(place):
    return api.geocode(place)[0]

def present_result(geocoded):
    print("{:s}, {:6.2f}, {:6.2f}".format(
        geocoded['formatted_address'],
        geocoded['geometry']['location']['lat'],
        geocoded['geometry']['location']['lng'],
    ).encode('utf-8'))

def main():
    with Pool(POOL_SIZE) as pool:
        results = pool.map(fetch_place, PLACES)
    
    for result in results:
        present_result(result)

if __name__ == "__main__":
    main()
    
$ python geocoding_by_multiprocessing.py
b'Reykjav\xc3\xadk, Iceland,  64.15, -21.94'
b'3110 Glendale Blvd, Los Angeles, CA 90039, USA,  34.12, -118.26'
b'Zadar, Croatia,  44.12,  15.23'
b'Venice, Metropolitan City of Venice, Italy,  45.44,  12.32'
b'Wroc\xc5\x82aw, Poland,  51.11,  17.04'
b'Bologna, Metropolitan City of Bologna, Italy,  44.49,  11.34'
b'Berlin, Germany,  52.52,  13.40'
b'Delhi, India,  28.70,  77.10'
b'New York, NY, USA,  40.71, -74.01'
b'Osaka, Japan,  34.69, 135.50'

Impressions

Studying parallel processing is hard. (Lol)

References

Recommended Posts

[Python] About multi-process
About python slices
About python yield
About python, class
About python decorators
About python reference
About Python decorators
About Python for loops
Summary about Python scraping
About function arguments (python)
[Python] Memo about functions
Summary about Python3 + OpenCV3
About Python, for ~ (range)
About Python3 character code
Multi-process asynchronously with python
[Python] Memo about errors
About Python development environment
Python: About function arguments
About Python Pyramid traversal
About Python3 ... (Ellipsis object)
[Python] Chapter 01-01 About Python (First Python)
[Python] About standard input
About __all__ in python
[Python] Find out about pip
About Fabric's support for Python 3
Python
About python objects and classes
About Python variables and objects
About the Python module venv
Multi-process
About python beginner's memorandum function
About the ease of Python
About the enumerate function (python)
About various encodings of Python 3
About Perl, Python, PHP, Ruby
About Python datetime and timezone
A memorandum about correlation [Python]
About Python string comparison operators
About Python and regular expressions
About the features of Python
About "for _ in range ():" in python
About Python and os operations
Python # About reference and copy
About Python sort () and reverse ()
A note about [python] __debug__
Python Note: About comparison using is
About installing Pwntools and Python2 series
Python: A Note About Classes 1 "Abstract"
[Python] Let's write briefly about comprehensions
Debug python multiprocess program with VSCode
About python dict and sorted functions
About dtypes in Python and Cython
[Python] What is @? (About the decorator)
What was surprising about Python classes
About Python pickle (cPickle) and marshal
[Python] About Executor and Future classes
About the basics list of Python basics
A note about mock (Python mock library)
kafka python
About LangID
About CAGR