2017/08/28: The part pointed out in the comment has been corrected.
Recently, I tried to increase the speed by implementing slow processing in Python in C using Cython, so I will summarize it so that I can look back when I forget how to do it. ** Only super rudimentary and superficial things are written **
Cython recognizes that it is like mediating a function written in C so that it can be called from Python. In other words, it is an image that you can use each good point by implementing the slow execution part in C and easily drawing the other part in Python.
windows
Go to this site and download the corresponding version. (If the part of cp ○○ is the version and Python 3.6, select the one that is 36)
At the command prompt, enter pip install the path of the downloaded file
to start the installation.
If you get an error such as ʻerror: Unable to find vcvarsall.bat` in the future, please refer to the following reference site [[Settings to use another version of VC ++]](https://www.regentechlog.com/2014/04/13 It was solved by referring to / build-python-package-on-windows / # VC-2)
Reference site https://www.regentechlog.com/2014/04/13/build-python-package-on-windows/
I think you can install it with $ pip install cython
.
--.c file: File that describes the implementation in C language --.h file: C header file --.pyx file: A file that writes functions that can be called from Python using C language functions, etc. --setup.py: File for compiling
In order to realize that it is faster, this time we will use the following code with double for loop.
python_code.py
# -*-encode: utf-8-*-
import time
if __name__ == "__main__":
start_t = time.time()
arr_a = [i for i in range(1000)]
arr_b = [i for i in range(1000)]
res = 0
for elem_a in arr_a:
for elem_b in arr_b:
res = res + elem_a + elem_b
print(res)
all_time = time.time() - start_t
print("Execution time:{0} [sec]".format(all_time))
All we are doing is finding the sum of the sums of all combinations of arrays ʻarr_a and ʻarr_b
.
When you do this,
out[1]
999000000
Execution time:0.24517321586608887 [sec]
It becomes.
Python seems to take a long time for for loop, so
for elem_a in arr_a:
for elem_b in arr_b:
res = res + elem_a + elem_b
This part is implemented in C language. The header file and source code are as follows.
cython_c_code.h
#ifndef CYTHON_CODE
#define CYTHON_CODE
int c_algo(int*, int*, int, int);
#endif
cython_c_code.c
#include "cython_c_code.h"
int c_algo(int *arr_a, int *arr_b, int size_a, int size_b){
int res = 0;
for(int i=0; i < size_a; i++){
for(int j=0; j < size_b; j++){
res = res + arr_a[i]+arr_b[j];
}
}
return res;
}
Write the above C function c_algo ()
in .pyx so that it can be called in Python.
cython_pyx_code.pyx
cimport numpy as np
cdef extern from "cython_c_code.h":
int c_algo(int *arr_a, int *arr_b, int size_a, int size_b)
def cy_algo(np.ndarray[int, ndim=1] arr_a, np.ndarray[int, ndim=1] arr_b):
return c_algo(&arr_a[0], &arr_b[0], len(arr_a), len(arr_b))
The cimport
that appears here is an instruction to read the cython version of the header file.
In other words, by writing python cimport numpy as np
, it is possible to declare types such as np.ndarray
.
Apart from this, if you also need a numpy function, you also need ʻimport numpy`.
Next, describe the function to be used from the C header file.
cdef extern from header file name:
Function to use
Describe in this way.
Finally, describe the function to be called from the Python side.
It is described by def
like Python, but the type is specified in the argument part.
The type when receiving a numpy array as an argument is np.ndarray [element type, ndim = number of dimensions]
.
Also, when passing an array pointer to a C function, it can be passed with & array name [0]
.
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
import numpy as np
sourcefiles = ['cython_pyx_code.pyx','cython_c_code.c']
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("cython_code", sourcefiles, include_dirs=[np.get_include()])],
)
Import the required library and write the following in the argument ʻext_modules of
setup () .
[Extension ("name of this library ", [source files to use], include_dirs = [header of library to use])] Since we are using numpy this time, pass
np.get_include () to ʻinclude_dirs
.
Compile with $ python setup.py build_ext -i
.
If all goes well, you will have a build directory, and a library name.so
( .pyd
for windows). You can import and use it like a normal Python library.
cython_py_code.py
# -*-encode: utf-8-*-
import time
import numpy as np
import cython_code
if __name__ == "__main__":
start_t = time.time()
arr_a = [i for i in range(1000)]
arr_b = [i for i in range(1000)]
res = cython_code.cy_algo(np.array(arr_a), np.array(arr_b))
print(res)
all_time = time.time() - start_t
print("Execution time:{0} [sec]".format(all_time))
out[2]
999000000
Execution time:0.0010039806365966797 [sec]
The speed has improved by about 245 times.
You can also write the code directly in the .pyx file by specifying the type.
cy_only.pyx
cimport numpy as np
def cy_algo(np.ndarray[int, ndim=1] arr_a, np.ndarray[int, ndim=1] arr_b):
cdef int res
cdef int elem_a
cdef int elem_b
res = 0
for elem_a in arr_a:
for elem_b in arr_b:
res = res + elem_a +elem_b
return res
All variables used in this way are defined by cdef type name variable name
.
Note that if you forget to define ʻelem_a and elem_b` at first, the execution speed will slow down.
If you compile this and call it from Python,
out[3]
999000000
Execution time:0.10053086280822754 [sec]
It's about twice as fast, but it seems to be slower than writing in C because it's easy to write like Python.
Since it was pointed out in the comment, I modified the code as follows.
cy_only.pyx
cimport numpy as np
def cy_algo(np.ndarray[int, ndim=1] arr_a, np.ndarray[int, ndim=1] arr_b):
cdef int res = 0
cdef size_t len_a = len(arr_a)
cdef size_t len_b = len(arr_b)
for i in range(len_a):
for j in range(len_b):
res = res + arr_a[i] +arr_b[j]
return res
Then,
out[4]
999000000
Execution time:0.0019919872283935547 [sec]
The speed was close to that written in C. It turned out that even Cython solid writing will be faster if written properly.
I'm just starting to touch Cython and I only understand the introductory level, so I may have written something appropriate. I would appreciate it if you could point out in that case.
It seems that the speed is further improved by disabling unnecessary options in setup.py
.
A list can be found at here.
If I can understand it a little more, I may rewrite this article to make it easier to read.
Recommended Posts