[PYTHON] Use Cython with Jupyter Notebook

It's a note because I found that I could try Cython with Jupyter Notebook (iPython Notebook) more easily than I expected. Cython speeds up processing by compiling before execution and statically typing.

(The .ipynb file for this article has been uploaded to [here] on Github (https://github.com/matsuken92/Qiita_Contents/blob/master/General/Cython_test.ipynb).)

</ i> Environment

The environment I tried is as follows. I'm trying it on Mac and Anaconda. If you have Anaconda installed, no special preparation is required.

Python 3.5.1 |Anaconda custom (x86_64)| (default, Jun 15 2016, 16:14:02) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin 

IPython 5.0.0 

</ i> Let's try it

Execute magic command for Cython compilation

#Allow cython files to be compiled on Jupyter Notebook
%load_ext Cython

Declare Cython functions

Write the cython code with %% cython at the beginning. The example is used from Cython Tutorial Basics.

# ↓ -n <file name>By adding, it will be easier to check the file later.
%%cython -n test_cython_code
def fib(int n):
    cdef int i
    cdef double a=0.0, b=1.0

    for i in range(n):
        a, b = a+b, a
    return a

def primes(int kmax):
    cdef int n, k, i
    cdef int p[1000]
    result = []

    if kmax > 1000:
        kmax = 1000

    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i += 1

        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    return result

Try out

print(fib(90))
print(primes(20))

out


2.880067194370816e+18
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

did it!

</ i> Speed comparison with raw Python

Let's write the same process in raw Python and compare the execution times.

import numpy as np
#Python function for performance comparison
def pyfib(n):
    a, b = 0.0, 1.0
    for i in range(n):
        a, b = a+b, a
    return a

def pyprimes(kmax):
    p = np.zeros(1000)
    result = []

    #Maximum number is 1000
    if kmax > 1000:
        kmax = 1000

    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i += 1

        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    return result

Fibonacci number

#Repeatedly generate and measure the 1000th Fibonacci number
%timeit fib(1000)
%timeit pyfib(1000)

cython is about 50 times faster!

out


1000000 loops, best of 3: 786 ns per loop
10000 loops, best of 3: 42.5 µs per loop

Extract prime numbers

%timeit primes(1000)
%timeit pyprimes(1000)

out


100 loops, best of 3: 2.12 ms per loop
1 loop, best of 3: 218 ms per loop

This calculation is about 100 times faster!

</ i> Try using it for Pandas apply

1000 integers

df = pd.DataFrame(np.arange(1, 10**4), columns=['num'] )

You can use it just by specifying the function in the apply function: blush:

%timeit df['fib'] = df.num.apply(fib)
%timeit df['pyfib'] = df.num.apply(pyfib)

out


10 loops, best of 3: 39.2 ms per loop
1 loop, best of 3: 2.02 s per loop
print(df.head())

out


   num  fib  pyfib
0    1  1.0    1.0
1    2  1.0    1.0
2    3  2.0    2.0
3    4  3.0    3.0
4    5  5.0    5.0

The compiled cython file is stored in ~ / .ipython / cython. If a file name was specified with %% cython -n <file name> at compile time, it is stored here with that file name.

</ i> Handle ndarray

#Create data
rd.seed(71)
n_data = 10**5
X = pd.DataFrame(rd.normal(size=3*n_data).reshape((n_data,3)), columns=["a", "b", "c"])
print(X.shape)
print(X.head())

out


(100000, 3)
          a         b         c
0 -0.430603 -1.193928 -0.444299
1  0.489412 -0.451557  0.585696
2  1.177320 -0.965009  0.218278
3 -0.866144 -0.323006  1.412919
4 -0.712651 -1.362191 -1.705966

Write Cython code that takes an ndarray as an argument

%%cython -n sample_calc 
import numpy as np
cimport numpy as np

cpdef np.ndarray[double] sample_calc(np.ndarray col_a, np.ndarray col_b, np.ndarray col_c):
    #Type check for each column
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_c.dtype == np.float)
    
    #Check that the size of each column is the same
    cdef Py_ssize_t n = len(col_c)
    assert (len(col_a) == len(col_b) == n)
    cdef np.ndarray[double] res = np.empty(n)
    
    # (a-b)/Do the calculation c
    for i in range(n):
        res[i] = (col_a[i] - col_b[i])/col_c[i]
    return res

Call from Python side

sample_calc(X.a.values, X.b.values, X.c.values)

out


array([-1.71804336,  1.60658332,  9.81468496, ..., -0.44683095,
        0.46970409, -0.28352272])
#For comparison
def pysample_calc(col_a, col_b, col_c):
    #Type check for each column
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_c.dtype == np.float)
    
    #Check that the size of each column is the same
    n = len(col_c)
    assert (len(col_a) == len(col_b) == n)
    res = np.empty(n)
    
    # (a-b)/Do the calculation c
    for i in range(n):
        res[i] = (col_a[i] - col_b[i])/col_c[i]
    return res
%timeit sample_calc(X.a.values, X.b.values, X.c.values)
%timeit pysample_calc(X.a.values, X.b.values, X.c.values)

out


100 loops, best of 3: 16.7 ms per loop
10 loops, best of 3: 37.2 ms per loop

</ i> Calculate pi by Monte Carlo method

#Data generation
rd.seed(71)
n_data = 10**7
X2 = rd.random(size=(n_data,2)).astype(np.float)
X2.dtype

Definition of Cython function

%%cython -n calc_pi
import numpy as np
cimport numpy as np

cpdef np.ndarray[long]  calc_pi(np.ndarray[double, ndim=2] data):
    cdef Py_ssize_t n = len(data)
    cdef np.ndarray[long] res = np.empty(n, dtype=np.int)
    
    for i in range(n):
        res[i] = 1 if (data[i,0]**2 + data[i,1]**2) < 1 else 0
    return res

Python function for comparison

#Python function for comparison
def pycalc_pi(data):
    n = len(data)
    res = [1 if (data[i,0]**2 + data[i,1]**2) < 1 else 0 for i in range(n)]
    return res

I will measure it.

%time calc_pi(X2)
%time pycalc_pi(X2)

out


CPU times: user 25.2 ms, sys: 5.98 ms, total: 31.2 ms
Wall time: 31.1 ms
CPU times: user 7.7 s, sys: 46.1 ms, total: 7.75 s
Wall time: 7.75 s

Cython is much faster!

#Check if the results are the same
np.all(res == respy)

correct!

out


True
#Calculation of pi
np.sum(res)/n_data*4

out


3.1413555999999998

Try drawing.

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap

sns.set(style="darkgrid", palette="muted", color_codes=True)
#draw
n_plot = 10**4  #Number of points to draw
plt.figure(figsize=(8,8))
plt.scatter(X2[:n_plot,0], X2[:n_plot,1], c=res[:n_plot], s=10)

plot.png

You are properly judging the inside and outside of the circle.

reference

Cython Tutorial Basics http://omake.accense.com/static/doc-ja/cython/src/userguide/tutorial.html

O'Reilly "Cython" https://www.oreilly.co.jp/books/9784873117270/

pandas 0.18.1 documentation Enhancing Performance http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html

Recommended Posts