[PYTHON] See the power of speeding up with NumPy and SciPy

Summary

In a previous article, I wrote about speeding up NumPy and SciPy:

Speeding up numerical calculation using NumPy: Basics Speeding up numerical calculation using NumPy / SciPy: Application 1 Speeding up numerical calculation using NumPy / SciPy: Application 2

Is it really fast? Let's find out.

Survey method

Compare with the oleore implementation by Python. It may not be a valid evaluation because it is a comparison with the implementation that emphasizes simplicity rather than speed. Python ʻanaconda3, IPython's % timeit` for time measurement Use.

--Execution environment-- OS : Ubuntu16.04 LTS 64bit Python : anaconda3-4.1.1 CPU : Intel Corei5 3550 (4-core / 4-thread)

List initialization

For example, matrix initialization.

before


N = 3000
a = [[i + j for i in range(N)] for j in range(N)]

Let's say you wrote it in a comprehension like this. The execution time is ** 661ms </ font> **. On the other hand, in NumPy's ʻarange`

after


a = np.arange(0, N) + np.arange(0, N).reshape(N, 1)

** 15.5ms </ font> **. I don't say 100 times, but 40 times faster.

Matrix product

There are many linear algebra operations, but let's try a simple matrix product. Prepare two 200x200 matrices and calculate the product:

setting


import numpy as np
from numpy.random import rand

N = 200
a = np.array(rand(N, N))
b = np.array(rand(N, N))
c = np.array([[0] * N for _ in range(N)])

If implemented with the definition of matrix product

befor


for i in range(N):
	for j in range(N):
		for k in range(N):
			c[i][j] = a[i][k] * b[k][j]

Is it? Execution time is ** 7.7s </ font> **.

With NumPy's universal functions

after


c = np.dot(a, b)

It's good to be simple before speed. Execution time is ** 202us </ font> **. It's a moment that it's unreasonable to say how many times. The power of processing. Even if it is 1000 × 1000 ** 22.2ms </ font> **. At this point, the implementation of for loop becomes uncontrollable.

differential

Let's differentiate $ \ sin x $. The number of space divisions is 100000:

setting


import math as m

def f(x):
    return m.sin(x)

def g(x):
    return np.sin(x)
	
N, L = 100000, 2 * m.pi
x, dx = np.linspace(0, L, N), L / N

Keep the definition of derivative. Don't worry about accuracy at this time:

before


diff = []
for i in x:
	diff.append((f(i + dx) - f(i)) / dx)

Execution time is ** 91.9ms </ font> **. Maybe it's the reason why ʻappend` is slow. On the other hand, in NumPy

after


diff = np.gradient(g(x), dx)

It's simple and doesn't reduce the number of elements in the return value (how grateful this is for numerical calculations!). Execution time is ** 8.25ms </ font> **. It's about 10 times faster. By the way

g = np.vectorize(f)

You can also convert arguments and return values to the ndarray specification by doing.

Integral

Let's prepare a little core integral:

\int_{-\infty}^{\infty} dx\int_{-\infty}^{x} dy\; e^{-(x^2 + y^2)}

It is a double integral such that $ x $ is included in the integration interval of $ y $. The integrand and the number of divisions of space

setting


def h(x, y):
    return np.exp(-(x**2 + y**2))

N, L = 2000, 100
x, dx = np.linspace(-L/2, L/2, N), L / N
y, dy = np.linspace(-L/2, L/2, N), L / N

If you implement the double integral as defined,

before


ans = 0
for index, X in enumerate(x):
	for Y in y[:index+1]:
		ans += dx * dy * h(X, Y)

I wonder. The execution time is ** 5.89s </ font> **. But this is too inaccurate. If you write it yourself, you should use Simpson integral, but in that case The code becomes rather complicated. Also, even with Simpson's rule, improper integrals can only be handled by "taking a large enough space". However, if the difference is made finer, the execution time increases on the order of $ O (N ^ 2) $. On the other hand, SciPy's dblquad solves all that problem:

after


from scipy.integrate import dblquad
ans = dblquad(h, a = -np.inf, b = np.inf, gfun = lambda x : -np.inf, hfun = lambda x : x)[0]

The integration interval of $ x $ is $ [a, b] $, and the integration interval of $ y $ is $ [{\ rm gfun}, {\ rm hfun}] $. With adaptive integration, the error range can also be specified. The return value of dblquad is tuple, and it seems that the absolute error is also returned. Execution time is ** 51.1ms </ font> **. No more to say anything No. Great.

Eigenvalue equation

(It's a bonus. The numerical solution of the eigenvalue equation is a little maniac ...)

Let's solve the Schroedinger equation of the harmonic oscillator system. Is it the Jacobi method if we implement it ourselves? If you are interested, check it out:

before


I = np.eye(N)
H = [[0 for i in range(N)] for j in range(N)]
for i in range(N):
    H[i][i] = 2.0/(2*dx**2) + 0.5*(-L/2+dx*i)**2
    if(0 <= i+1 < N):
        H[i][i+1] = -1.0/(2*dx**2)
    if(0 <= i-1 < N):
        H[i][i-1] = -1.0/(2*dx**2)
H = np.array(H)

#Jacobi method
flag = True
while(flag):
    #Find the maximum and index of off-diagonal components
    maxValue = 0
    cI, rI = None, None
    for j in range(N):
        for i in range(j):
            if(maxValue < abs(H[i][j])):
                maxValue = abs(H[i][j])
                rI, cI = i, j
                
    #Convergence test
    if(maxValue < 1e-4):
        flag = False
	# print(maxValue)
    
    #Preparation of rotation matrix
    theta = None
    if(H[cI][cI] == H[rI][rI]):
        theta = m.pi/4
    else:
        theta = 0.5*m.atan(2.0*H[rI][cI]/(H[cI][cI]-H[rI][rI]))
        J = np.eye(N)
        J[rI][rI] = m.cos(theta)
        J[cI][cI] = m.cos(theta)
        J[rI][cI] = m.sin(theta)
        J[cI][rI] = -m.sin(theta)
    
    #Matrix operation
    H = np.array(np.matrix(J.T)*np.matrix(H)*np.matrix(J))
    I = np.array(np.matrix(I)*np.matrix(J))
    
#Storage of eigenvalues and eigenvectors
v, w = I.transpose(), []
for i in range(N):
    w.append([H[i][i], i])
w.sort()

It converges when the maximum value of the off-diagonal term becomes sufficiently small. It is not very convenient because the eigenvalues are not in ascending order. The execution time is ** 15.6s </ font> **. It's hard to increase the number of divisions any more. With NumPy

after


#System settings
L, N = 10, 80
x, dx = np.linspace(-L/2, L/2, N), L / N

#Exercise term K
K = np.eye(N, N)
K_sub = np.vstack((K[1:], np.array([0] * N)))
K = dx**-2 * (2 * K - K_sub - K_sub.T)

#Potential term
V = np.diag(np.linspace(-L/2, L/2, N)**2)

#Hermitian matrix eigenvalue equation
#w is the eigenvalue,v is the eigenvector
H = (K + V) / 2
w, v = np.linalg.eigh(H)

The execution time is ** 1.03ms </ font> **. When it comes to an algorithm as complicated as the eigenvalue equation, there is no chance of winning in many ways. You'll have to rely on Lapack or GSL for C, but that's also hard.

in conclusion

This should be enough. ** You can see how fast NumPy / SciPy works and how simple it can be written. ** If you do the above calculation in C / C ++ without relying on an external library If you try to write it in, the code will be as written at the beginning of each chapter. Also, from experience, it seems that many C / C ++ external libraries are hard to touch.

It is just for reference, but the speed difference is summarized as follows:

Initialize list: ** 661ms </ font> ** → ** 15.5ms </ font> ** Matrix product: ** 7.7s </ font> ** → ** 202us </ font> ** Differentiation: ** 91.9ms </ font> ** → ** 8.25ms </ font> ** (Double) Integral: ** 5.89s </ font> ** → ** 51.1ms </ font> ** Eigenvalue equation: ** 15.6s </ font> ** → ** 1.03ms </ font> **

** Overwhelming. ** I think the calculator is well worth using Python.

Recommended Posts