Parallel processing of Python joblib does not work in uWSGI environment. How to process in parallel on uWSGI?

Introduction

If you are doing machine learning or deep running with Python, the processing becomes heavy and it may cause a problem in terms of performance.

Recently, I also have heavy processing, so I decided to add parallel processing to improve performance. A performance improvement was seen using a library for parallel processing called joblib locally.

However, in the error that executed the same process on the development environment. .. .. Apparently, joblib doesn't work in the uWSGI environment. I summarized the measures around that.

What is uWSGI

――UWSGI is an application server for running web services in Python.

Environmental information

Joblib code that fails on uWSGI

--Processing you want to parallelize

length = 1000
def sum(i, j):
  return i + j

--Parallelization

from joblib import Parallel, delayed

sum_list = Parallel(n_jobs=-1)( [delayed(calc_sum)(i, j) for j in range(length) for i in range(length)])

-However, when you run it on uWSGI, you get an error like this

exception calling callback for <Future at 0x7fbc520c7eb8 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 309, in __call__
    self.parallel.dispatch_next()
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 731, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 510, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 151, in submit
    fn, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1022, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1), EXIT(1), EXIT(1), EXIT(1)}

Workaround 1: threads works

from joblib import Parallel, delayed

sum_list = Parallel(n_jobs=-1, prefer='threads')( [delayed(calc_sum)(i, j) for j in range(length) for i in range(length)])

――Of course, depending on the processing, it may be slow if you use threads.

Target method 2: Multiprocessing

import multiprocessing
from multiprocessing import Process

with multiprocessing.Pool() as pool:
    process = [pool.apply_async(calc_sum, (i, j)) for j in range(length) for i in range(length)]
    sum_list = [f.get() for f in process]

Workaround 3: Use guicorn instead of uWSGI

I didn't do this this time. The reason is that changing the AP server is risky because it was already running on uWSGI.

Finally

Speed comparison

Normal joblib(multiprocess) joblib(threads) multiprocessing
32.9 µs 11 µs 40.1 µs 4.05 µs

Joblib is more readable and easier to write, but multiprocessing is a feature built into Python itself, so it often has better performance. Also, it seems unlikely that you will make an error like this one.

Recommended Posts

Parallel processing of Python joblib does not work in uWSGI environment. How to process in parallel on uWSGI?
How to develop in a virtual environment of Python [Memo]
How to access environment variables in Python
How to work with BigQuery in Python
Virtualenv does not work on Python3.5 (Windows)
How to keep track of work in Powershell
How to do multi-core parallel processing with python
Summary of how to import files in Python 3
Summary of how to use MNIST in Python
Jinja2 2.9.6 does not work on Lambda Python 3 series
I managed to solve the situation where Python does not work on Mac
How to take multiple arguments when doing parallel processing using multiprocessing in python
I want to use Python in the environment of pyenv + pipenv on Windows 10
How to build an environment for using multiple versions of Python on Mac
How to run GUI programs such as tkinter in Python environment on WSL2
Patch when full text search does not work in GAE / Python local environment
How to get the number of digits in Python
How to measure processing time in Python or Java
How to build a Django (python) environment on docker
[Work efficiency] How to change file names in Python
How to build a Python environment on amazon linux 2
Force luigi to do parallel processing in windows environment
What to do when python3 type venv does not work well on Raspberry Pi
[Linux] How to install a package on a server that does not have a network environment (standalone)
How to build a new python virtual environment on Ubuntu
Processing of python3 that seems to be usable in paiza
How to use VS Code in venv environment on windows
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
A note on how to load a virtual environment in PyCharm
Comparison of how to use higher-order functions in Python 2 and 3
How to get a list of built-in exceptions in python
How to develop in Python
How to process camera images with Teams and Zoom Volume of processing in animation style
Overview of Python virtual environment and how to create it
How to not escape Japanese when dealing with json in python
How to handle multiple versions of CUDA in the same environment
How to determine the existence of a selenium element in Python
How to install OpenCV on Cloud9 and run it in Python
How to rebuild python environment from pyenv on Mac environment (El Capitan)
Effective Python Memo Item 11 Use zip to process iterators in parallel
How to know the internal structure of an object in Python
How to check the memory size of a variable in Python
How to use python put in pyenv on macOS with PyCall
How to check the memory size of a dictionary in Python
How to build a Python environment using Virtualenv on Ubuntu 18.04 LTS
How to update the python version of Cloud Shell on GCP
Arrow keys do not work in zsh + python shell on mac
[Python] How to do PCA in Python
[Python] Easy parallel processing with Joblib
Python parallel processing (multiprocessing and Joblib)
How to collect images in Python
How to use SQLite in Python
LocateCenterOnScreen does not work on PyAutoGui
How to use Mysql in python
How to wrap C in Python
Environment construction of python3.8 on mac
How to use ChemSpider in Python
How to use PubChem in Python
How to handle Japanese in Python
How to resolve "No kernel of grammar Python found" error in Atom
How to send a visualization image of data created in Python to Typetalk