If you are doing machine learning or deep running with Python, the processing becomes heavy and it may cause a problem in terms of performance.
Recently, I also have heavy processing, so I decided to add parallel processing to improve performance.
A performance improvement was seen using a library for parallel processing called joblib
locally.
However, in the error that executed the same process on the development environment. .. .. Apparently, joblib doesn't work in the uWSGI environment. I summarized the measures around that.
――UWSGI is an application server for running web services in Python.
--Processing you want to parallelize
length = 1000
def sum(i, j):
return i + j
--Parallelization
from joblib import Parallel, delayed
sum_list = Parallel(n_jobs=-1)( [delayed(calc_sum)(i, j) for j in range(length) for i in range(length)])
-However, when you run it on uWSGI, you get an error like this
exception calling callback for <Future at 0x7fbc520c7eb8 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 309, in __call__
self.parallel.dispatch_next()
File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 731, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1), EXIT(1), EXIT(1), EXIT(1)}
from joblib import Parallel, delayed
sum_list = Parallel(n_jobs=-1, prefer='threads')( [delayed(calc_sum)(i, j) for j in range(length) for i in range(length)])
――Of course, depending on the processing, it may be slow if you use threads.
import multiprocessing
from multiprocessing import Process
with multiprocessing.Pool() as pool:
process = [pool.apply_async(calc_sum, (i, j)) for j in range(length) for i in range(length)]
sum_list = [f.get() for f in process]
I didn't do this this time. The reason is that changing the AP server is risky because it was already running on uWSGI.
Normal | joblib(multiprocess) | joblib(threads) | multiprocessing |
---|---|---|---|
32.9 µs | 11 µs | 40.1 µs | 4.05 µs |
Joblib is more readable and easier to write, but multiprocessing is a feature built into Python itself, so it often has better performance. Also, it seems unlikely that you will make an error like this one.
Recommended Posts