I wanted to save a graph of certain measurement data (csv) as a PDF. When the number of data was large (1000 data), it took a lot of time (about 8 minutes), so I thought about improving it.
I want to save a large amount of PDF at high speed and prevent the PC from getting stuck during operation.
I mainly did the following using python.
For loop normally in python
Treading
I had implementation experience for various reasons, so I tried it → It seems that the speed does not change much because this one can only use one core. Still, it is surprising that 4 minutes will change
multiprocessing.Process
Since the function I want to process in parallel has multiple arguments, I used Process. However, no matter how much gg I did, I didn't know how to limit the number of threads, so I couldn't implement it. Because of that, when I executed it, it processed 1000 parallel processes, which made my PC clunky.
multiprocessing.Pool
It can have only one argument, but you can easily specify the number of threads. The following functions could be used for multiple arguments.
def wrapper(self, args):
return self.f(*args)
Coding of multiprocessing is OK if you do as follows. It is the minimum configuration (there is no meaning of parallelization) that only shows how to code. For details, refer to this article. I thoroughly investigated the parallel processing and parallel processing of Python --Qiita
from multiprocessing import Process, Pool
import multiprocessing
def f(a, b): #Functions to process in parallel
print(a, end=",") #It's just an operation check, so it's simple
def wrapper(args): #Pool can only take one argument, so bite it and expand it.
f(*args)
def main():
#Parallel processing by Process(Multiple arguments: o,Number of processes specified: x)
print('Process')
p_list = []
for i in range(20):
p = Process(
target=f,
kwargs={'a': i, 'b': 0})
p.start() #start
p_list.append(p)
for p in p_list:
p.join() #Wait for the end
#Parallel processing by Pool(Multiple arguments: x,Number of processes specified: o)
print('\nPool')
p = Pool(multiprocessing.cpu_count()) #Parallel processing for the number of CPUs
values = []
for i in range(20):
values.append((i, 0))
p.map(wrapper, values) #Expand singular arguments to multiple
if __name__ == "__main__":
main()
result
Process
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
Pool
6,0,8,14,2,10,16,3,11,17,5,13,18,1,9,15,4,12,19,7,
The ranking is as follows.
Compare 1 and 2. In 1., the process will proceed in order of 8 pieces.
In 2., all processing (1000 processing this time) will be performed in parallel.
――Isn't 2. supposed to be faster? In fact, the CPU cannot process in parallel in principle. What seems to be parallel processing is actually just changing what you do. For example, parallel processing of processes 1, 2, and 3 Process 1 (1 μs execution) → Process 2 (1 μs execution) → Process 3 (1 μs execution) → Process 1 (1 μs execution from the continuation of the previous step) → ... It just pretends to be in parallel. Therefore Process 1 (execute until the end) → Process 2 (execute until the end) → Process 3 (execute until the end) However, in principle, the time does not change. (Sometimes) However, since parallel processing can be performed for the number of cores (the number of CPUs), processing will be faster until then.
――Why did you get faster? Is it because you can use a lot of memory? In 2., 1000 processes are performed, so the memory becomes cluttered. Isn't that because it has improved?