Previously, I had to analyze a little big data, which took a lot of time to process. At that time, I will summarize the method used to speed up the processing.

The following is using the multiprocessing module.

import

`multi.py`


from multiprocessing import Pool
from multiprocessing import Process

How to use

Basic

It is like this.

`multi.py`


def function(hoge):
    #Thing you want to do
    return x

def multi(n):
    p = Pool(10) #Maximum number of processes:10
    result = p.map(function, range(n))
    return result

def main():
    data = multi(20)
    for i in data:
        print i

main()

In this case, the process is "execute the function 20 times by changing the value to 0,1,2 ... 19". Since the return value of function is included in the result as a list, it is received and output as standard.

Also, in my environment, I can use up to 12 cores (6 cores and 12 threads to be exact), so I set the maximum number of processes to 10. If you use it to the maximum, it will be difficult to open the browser, so it is safe to stop it.

CPU usage

The CPU usage rate during parallel processing is also listed. Screenshot .png You can see that parallel processing is properly performed with multiple cores like this.

Get process id

You can also get the process id that is in charge of each process.

`multi.py`


import os

def fuction(hoge):
    #Thing you want to do
    print 'process id:' + str(os.getpid())
    return x

#Omitted below

It is interesting to know that if you display it like this, it is being executed in a different process.

How fast was it

The process, which took about 35 hours, was completed in just over 4 hours. The processing time is less than 1/10, which is a sufficient result.

Of course, the speed of each process is not increasing, so it is necessary to allocate work evenly in order to improve efficiency, but I think that it is useful because there are many such things in the analysis system.

Recommended Posts

How to do multi-core parallel processing with python

How to do portmanteau test with python

How to do hash calculation with salt in Python

An introduction to Python distributed parallel processing with Ray

Python: How to use async with

[Python] Easy parallel processing with Joblib

To do tail recursion with Python2

How to get started with Python

What to do with PYTHON release?

How to use FTP with Python

How to calculate date with python