[PYTHON] How to speed up scikit-learn like conda Numpy

Recently, it has been a little talked about that Numpy inserted with conda works faster than Numpy inserted with pip (although the original article is not recent). In this article, I will explain that scikit-learn also works faster if you devise an installation.

Introduction

Article, "Anaconda's NumPy seems to be fast, so I tried it." https://tech.morikatron.ai/entry/2020/03/27/100000

I've seen it many times on the Twitter timeline recently (although the original article was written in March 2008).

It is said that Numpy installed by conda is faster than Numpy installed by pip.

Why is it so fast?

In the above article, the CPU uses "Intel Core i7-9750H".

The specifications of this CPU are as follows. https://www.intel.co.jp/content/www/jp/ja/products/processors/core/i7-processors/i7-9750h.html

The "Instruction Set Extension" ** of this specification is described as ** "Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2" **.

** AVX2 ** is included in the instruction set extension.

In addition, ** AVX-512 ** is also installed in the relatively new high-performance Intel CPU.

For AVX, the following page is detailed, but it is the successor to the streaming SIMD extension instruction.

** "The point is that it is a function that can execute multiple operations with one instruction." **

https://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%88%E3%83%AA%E3%83%BC%E3%83%9F%E3%83%B3%E3%82%B0SIMD%E6%8B%A1%E5%BC%B5%E5%91%BD%E4%BB%A4

The Intel MKL (Math Kernel Library) ** is an Intel CPU function that accelerates mathematical operations using AVX2 or higher of this AVX. https://www.xlsoft.com/jp/products/intel/perflib/mkl/index.html

By performing various calculations with AVX2 or AVX-512, it speeds up.

If the CPU is AVX compatible, Numpy installed by conda will be calculated using this Intel MKL, so the processing will be faster.

Intel's hard work

In the GPU era, Intel is also working hard, around 2017, We have released Numpy and sckit-learn libraries ** that support acceleration on Intel CPUs, including Intel MKL.

pip install intel-scikit-learn https://pypi.org/project/intel-scikit-learn/

pip install intel-numpy https://pypi.org/project/intel-numpy/

With these, you can use the high-speed version from the beginning even if you install it with pip.

Such a version was also released by Intel a long time ago, The version has not caught up with the original Numpy and sklearn, and it is not maintained, so We do not recommend using these.

Accelerate scikit-learn

The first article I introduced, "Anaconda's NumPy seems to be fast, so I tried it." https://tech.morikatron.ai/entry/2020/03/27/100000

Even in the sentence of

Various other modules have adopted Intel MKL. Great! NumPy NumExpr SciPy Scikit-Learn Tensorflow… In Windows, it is a separate package called tensorflow-mkl PyTorch… It seems that Intel MKL is also used via pip.

As you can see, installing with conda will automatically install a faster version that uses Intel MKL.

However, if you want to get it working the fastest, follow the instructions on the scikit-learn installation page. https://scikit-learn.org/stable/install.html

On this page 「Third party distributions of scikit-learn」 Written in,


「Intel conda channel」 Intel conda channel Intel maintains a dedicated conda channel that ships scikit-learn:

$ conda install -c intel scikit-learn This version of scikit-learn comes with alternative solvers for some common estimators. Those solvers come from the DAAL C++ library and are optimized for multi-core Intel CPUs.

Note that those solvers are not enabled by default, please refer to the daal4py documentation for more details.

Compatibility with the standard scikit-learn solvers is checked by running the full scikit-learn test suite via automated continuous integration as reported on https://github.com/IntelPython/daal4py.


If you put sckit-learn in Intel conda channel, not only Intel MKL but also Some scikit-learn processing is replaced by Intel multi-core optimized function processing.

So if you want to use the fastest version of scikit-learn,

$ conda install scikit-learn

not,

$ conda install -c intel scikit-learn

It is recommended to install with (in a CPU environment compatible with Intel AVX2 or higher).

at the end

AWS, Deep Learning images, and Azure DSVM (Data Science Virtual Machines) are managed based on conda.

And scikit-learn is installed in conda.

I don't know if they are Intel optimized versions (if anyone knows, please let me know).

If you recreate the virtual environment with conda yourself, you can install the above high-speed version ... (You need to check the CPU of the IaaS machine to see if it supports AVX2. If it is an easy machine, it may only support AVX)

As mentioned above, it was a method to speed up scikit-learn like Conda Numpy. (I'm not very strong around the CPU, so please comment if you make a mistake)

Remarks

【Transmission of information】 Recently, I have posted on Twitter information about AI, business, and management, such as articles and sites that I found interesting, and impressions of books I read.

Yutaro Ogawa @ISID_AI_team https://twitter.com/ISID_AI_team

The information I'm looking at is interesting and important! I am sharing what I thought.

[Others] The AI Technology Department development team, which I lead, is looking for members. If you are interested, please click here

[Disclaimer] The content of this article itself is the opinion / transmission of the author, not the official opinion of the company to which the author belongs.


Recommended Posts

How to speed up scikit-learn like conda Numpy
How to speed up Python calculations
How to speed up instantiation of BeautifulSoup
Indispensable if you use Python! How to use Numpy to speed up operations!
How to use numpy
How to operate NumPy
How to write faster when using numpy like deque
How to install mkl numpy
How to keep conda off
Numba to speed up as Python
How to create a Conda package
Project Euler 4 Attempt to speed up
[DRF] Snippet to speed up PrimaryKeyRelatedField
scikit-learn How to use summary (machine learning)
How to set up SVM using Optuna
How to install NumPy on Raspberry Pi
How to speed up Pandas apply method with just one sentence (with verification calculation)
How to create large files at high speed
How to set up Random forest using Optuna
How to install pip, numpy in Autodesk MAYA
How to use Tweepy ~ Part 2 ~ [Follow, like, etc.]
How to measure line speed from the terminal
How to set up Random forest using Optuna
How to set up a local development server
[Python] Do your best to speed up SQLAlchemy
How to make Python faster for beginners [numpy]
How to set up public key authentication in ssh
Trial and error to speed up heat map generation
Trial and error to speed up Android screen captures
How to set up a Python environment using pyenv
How to visualize the decision tree model of scikit-learn
How to set up and compile your Cython environment
Try to solve Sudoku at explosive speed using numpy