[PYTHON] Parallel processing with Parallel of scikit-learn

2019/1016 Addendum: I wrote about the cache function of Joblib. [[Python] Use Joblib cache to omit the same calculation](https://tma15.github.io/blog/2019/10/06/python-Use joblib cache to omit the same calculation /)

Speaking of scikit-learn tutorials, the introduction of @Scaled_Wurm Man is very easy to understand. This time I happened to be reading the source code and note a niche that wasn't introduced in that blog entry. The conclusion is that if there is no particular reason, the part written in multiprocessing may be replaced with Parallel.

Parallel

Is multiprocessing useless?

Parallel is (the original is Notes in the source code)

argument

Example

Simple example

>>> from math import sqrt
>>> from sklearn.externals.joblib import Parallel, delayed
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

Example of seeing progress

>>> from time import sleep
>>> from sklearn.externals.joblib import Parallel, delayed
>>> r = Parallel(n_jobs=2, verbose=5)(delayed(sleep)(.1) for _ in range(10)) #doctest: +SKIP
[Parallel(n_jobs=2)]: Done   1 out of  10 | elapsed:    0.1s remaining:    0.9s
[Parallel(n_jobs=2)]: Done   3 out of  10 | elapsed:    0.2s remaining:    0.5s
[Parallel(n_jobs=2)]: Done   6 out of  10 | elapsed:    0.3s remaining:    0.2s
[Parallel(n_jobs=2)]: Done   9 out of  10 | elapsed:    0.5s remaining:    0.1s
[Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:    0.5s finished

Specify pre_dispatch

>>> from math import sqrt
>>> from sklearn.externals.joblib import Parallel, delayed

>>> def producer():
...     for i in range(6):
...         print('Produced %s' % i)
...         yield i

>>> out = Parallel(n_jobs=2, verbose=100, pre_dispatch='1.5*n_jobs')(
...                         delayed(sqrt)(i) for i in producer()) #doctest: +SKIP
Produced 0 ###The first one
Produced 1 ###Second
Produced 2 ###Third
[Parallel(n_jobs=2)]: Done   1 jobs       | elapsed:    0.0s
Produced 3
[Parallel(n_jobs=2)]: Done   2 jobs       | elapsed:    0.0s
Produced 4
[Parallel(n_jobs=2)]: Done   3 jobs       | elapsed:    0.0s
Produced 5
[Parallel(n_jobs=2)]: Done   4 jobs       | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   5 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s finished

Recommended Posts

Parallel processing with Parallel of scikit-learn
Parallel processing with multiprocessing
Parallel processing with local functions
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
Grid search of hyperparameters with Scikit-learn
[Python] Easy parallel processing with Joblib
Basics of binarized image processing with Python
Isomap with Scikit-learn
SLICECAP: Split parallel processing of PCAP files
DBSCAN with scikit-learn
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
Drawing with Matrix-Reinventor of Python Image Processing-
kmeans ++ with scikit-learn
Example of efficient data processing with PANDAS
Receive a list of the results of parallel processing in Python with starmap
How to do multi-core parallel processing with python
About the processing speed of SVM (SVC) of scikit-learn
Predict the second round of summer 2016 with scikit-learn
Asynchronous processing with Arduino (Asynchronous processing of processing requests from Linux)
100 language processing knock-75 (using scikit-learn): weight of features
Parallel processing with no deep meaning in Python
About the behavior of Queue during parallel processing
Cross Validation with scikit-learn
Image processing with MyHDL
Processing datasets with pandas (1)
Processing datasets with pandas (2)
Multi-class SVM with scikit-learn
Clustering with scikit-learn + DBSCAN
Learn with chemoinformatics scikit-learn
Image processing with Python
DBSCAN (clustering) with scikit-learn
SMP parallel with OpenMP
Install scikit.learn with pip
Various processing of Python
Calculate tf-idf with scikit-learn
Basic processing of librosa
Image Processing with PIL
[Chapter 5] Introduction to Python with 100 knocks of language processing
An introduction to Python distributed parallel processing with Ray
Define your own distance function with k-means of scikit-learn
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
Easy learning of 100 language processing knock 2020 with "Google Colaboratory"
[Chapter 4] Introduction to Python with 100 knocks of language processing
Image processing with Python (Part 2)
About max_iter of LogisticRegression () of scikit-learn
100 Language Processing with Python Knock 2015
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
Parallel computing with iPython notebook
Image processing with PIL (Pillow)
"Apple processing" with OpenCV3 + Python3
Visualize the results of decision trees performed with Python scikit-learn
Database search (verification of processing speed with or without index)
Neural network with Python (scikit-learn)
Consistency of scikit-learn API design
Equation of motion with sympy
Acoustic signal processing with Python (2)
100 language processing knock-77 (using scikit-learn): measurement of correct answer rate
Acoustic signal processing with Python