Languages that seem to be slow when turning a for loop

Matlab & Python (with numpy), which is not a compilation language but is widely accepted in numerical calculations due to the ease of handling arrays, is slow to calculate with a for statement, so avoid it as much as possible! I think you often hear the discourse. What is the impact? I tried to verify it using the explicit method of the two-dimensional wave equation.

Central solution of wave equation

Let's take a look at the wave equation, which is the governing equation, and its numerical solution.

The two-dimensional wave equation

\frac{\partial^2 u}{\partial t^2} - c^2\left(\frac{\partial^2 u}{\partial x^2}+\frac{\partial^2 u}{\partial y^2}\right)=0

And the discretization of this central system is

\frac{u^{n+1}_{i,j}-2u^n_{i,j}+u^{n-1}_{i,j}}{\Delta t^2} - c^2\left(\frac{u^n_{i+1,j}-2u^n_{i,j}+u^n_{i-1,j}}{\Delta x^2}+\frac{u^n_{i,j+1}-2u^n_{i,j}+u^n_{i,j-1}}{\Delta y^2}\right)=0

It will be. If you organize this so that it is easy to calculate,

u^{n+1}_{i,j}=2u^n_{i,j}-u^{n-1}_{i,j} +  c^2 \Delta t^2 \left(\frac{u^n_{i+1,j}-2u^n_{i,j}+u^n_{i-1,j}}{\Delta x^2}+\frac{u^n_{i,j+1}-2u^n_{i,j}+u^n_{i,j-1}}{\Delta y^2}\right)

It will be.

If the boundary condition is a Dirichlet condition, it is easy because you only need to substitute the boundary value.

Matlab First, let's write using a loop as appropriate.

c = 1.0;
xmin = 0.;
ymin = 0.;
xmax = 1.;
ymax = 1.; %The calculation area is[0,1],[0,1]
xmesh = 400; 
ymesh = 400; %The number of divisions is x,400 on both y-axis
dx = (xmax-xmin)/xmesh;
dy = (ymax-ymin)/ymesh;
dt = 0.2*dx/c;
u0 = zeros(xmesh,ymesh); % u^{n-1}
u1 = zeros(xmesh,ymesh); % u^n
u2 = zeros(xmesh,ymesh); % u^{n+1}
u1(100:130,100:130)=1e-6;%The initial velocity is given to a certain area.
x = xmin+dx/2:dx:xmax-dx/2;
y = ymin+dy/2:dy:ymax-dy/2;
t=0.;
tic %A convenient guy who can measure time with tic toc
while t<1.0
    for j = 2:ymesh-1
        for i = 2:xmesh-1
            u2(i,j) = 2*u1(i,j)-u0(i,j) + c*c*dt*dt*((u1(i+1,j)-2*u1(i,j)+u1(i-1,j))/(dx*dx) +(u1(i,j+1)-2*u1(i,j)+u1(i,j-1))/(dy*dy) );
        end
    end
    u0=u1;
    u1=u2;
    t = t+dt;
    %Give Dirichlet conditions
    for i=1:xmesh
        u1(i,1)=0.;
        u1(i,ymesh)=0.;
    end
    for j=1:ymesh
        u1(1,j)=0.;
        u1(xmesh,j)=0.;
    end
    
end
toc
[X,Y] = meshgrid(x,y);
mesh(X,Y,u1); %Finally draw

In my environment the elapsed time was 5.249230 seconds. Let's write this without a loop except for the time part.

c = 1.0;
xmin = 0.;
ymin = 0.;
xmax = 1.;
ymax = 1.; %The calculation area is[0,1],[0,1]
xmesh = 400; 
ymesh = 400; %The number of divisions is x,400 on both y-axis
dx = (xmax-xmin)/xmesh;
dy = (ymax-ymin)/ymesh;
dt = 0.2*dx/c;
u0 = zeros(xmesh,ymesh); % u^{n-1}
u1 = zeros(xmesh,ymesh); % u^n
u2 = zeros(xmesh,ymesh); % u^{n+1}
u1(100:130,100:130)=1e-6;%The initial velocity is given to a certain area.
x = xmin+dx/2:dx:xmax-dx/2;
y = ymin+dy/2:dy:ymax-dy/2;
t=0.;
tic %A convenient guy who can measure time with tic toc
while t<1.0
    u2(2:xmesh-1,2:ymesh-1) = 2*u1(2:xmesh-1,2:ymesh-1)-u0(2:xmesh-1,2:ymesh-1)+c*c*dt*dt*(diff(u1(:,2:ymesh-1),2,1)/(dx*dx)+diff(u1(2:xmesh-1,:),2,2)/(dy*dy));
    u0=u1;
    u1=u2;
    t = t+dt;
    %Give Dirichlet conditions
    u1(:,1)=0.;
    u1(:,ymesh)=0.;
    u1(1,:)=0.;
    u1(xmesh,:)=0.;
end
toc
[X,Y] = meshgrid(x,y);
mesh(X,Y,u1)

Use a little head when writing code that looks neat. This is because it is necessary to consider the index deviation when using diff. The execution time was 2.734515 seconds. The difference in execution time is only about twice as large, and the impression is that the difference is smaller than expected.

Python I often use numpy because it is convenient. Let's write using a loop.

import numpy as np
import matplotlib.pyplot as plt
from copy import deepcopy
import time
c = 1.0
xmin = 0.
ymin = 0.
xmax = 1.
ymax = 1.  #The calculation area is[0,1],[0,1]
xmesh = 400
ymesh = 400  #The number of divisions is x,400 on both y-axis
dx = (xmax-xmin)/xmesh
dy = (ymax-ymin)/ymesh
dt = 0.2*dx/c
u0 = np.zeros((xmesh, ymesh))  # u^{n-1}
u1 = np.zeros((xmesh, ymesh))  # u^n
u2 = np.zeros((xmesh, ymesh))  # u^{n+1}
u1[100:130, 100:130] = 1e-6  #The initial velocity is given to a certain area.
x = np.linspace(xmin+dx/2, xmax-dx/2, xmesh)
y = np.linspace(ymin+dy/2, ymax-dy/2, ymesh)
t = 0.
before = time.time()
while t < 1.:
    for j in range(1, xmesh-1):
        for i in range(1, ymesh-1):
            u2[i, j] = 2*u1[i, j]-u0[i, j]+c*c*dt*dt * \
                ((u1[i+1, j]-2.*u1[i, j]+u1[i-1, j])/(dx*dx) +
                 (u1[i, j+1]-2.*u1[i, j]+u1[i, j-1])/(dy*dy))

    u0 = deepcopy(u1)
    u1 = deepcopy(u2)
    t = t+dt
    #Give Dirichlet conditions
    for i in range(0,xmesh):
        u1[i, 0] = 0.
        u1[i, ymesh-1] = 0.
    for j in range(0,ymesh):
        u1[0, j] = 0.
        u1[xmesh-1, j] = 0.

print(time.time()-before)

X, Y = np.meshgrid(x, y)
fig = plt.figure()  #Creating a plot area
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, u1)
plt.show()

The calculation was completed in 880.909 seconds. Too slow.

Next is the code without loops.

import numpy as np
import matplotlib.pyplot as plt
from copy import deepcopy
import time
c = 1.0
xmin = 0.
ymin = 0.
xmax = 1.
ymax = 1.  #The calculation area is[0,1],[0,1]
xmesh = 400
ymesh = 400  #The number of divisions is x,400 on both y-axis
dx = (xmax-xmin)/xmesh
dy = (ymax-ymin)/ymesh
dt = 0.2*dx/c
u0 = np.zeros((xmesh, ymesh))  # u^{n-1}
u1 = np.zeros((xmesh, ymesh))  # u^n
u2 = np.zeros((xmesh, ymesh))  # u^{n+1}
u1[100:130, 100:130] = 1e-6  #The initial velocity is given to a certain area.
x = np.linspace(xmin+dx/2, xmax-dx/2, xmesh)
y = np.linspace(ymin+dy/2, ymax-dy/2, ymesh)
t = 0.
before = time.time()
while t < 1.0:
    u2[1:xmesh-1, 1:ymesh-1] = 2*u1[1:xmesh-1, 1:ymesh-1]-u0[1:xmesh-1, 1:ymesh-1]+c*c*dt*dt * \
        (np.diff(u1[:, 1:ymesh-1], n=2, axis=0)/(dx*dx) +
         np.diff(u1[1: xmesh-1, :], n=2, axis=1)/(dy*dy))
    u0 = deepcopy(u1)
    u1 = deepcopy(u2)
    t = t+dt
    #Give Dirichlet conditions
    u1[:, 0] = 0.
    u1[:, ymesh-1] = 0.
    u1[0, :] = 0.
    u1[xmesh-1, :] = 0.

print(time.time()-before)

X, Y = np.meshgrid(x, y)
fig = plt.figure()  #Creating a plot area
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, u1)
plt.show()

The result is 5.539646625518799 seconds. It's more than 100 times faster than the python loop version, but it takes a lot longer than Matlab. Is it badly written?

Conclusion

Avoid swinging double loops in Python as much as possible. It takes an unusual amount of calculation time. With Matlab, it was a little slower.

2020/10/14 Addendum: A person familiar with Matlab taught me. Matlab seems to be JIT compiled automatically. Wonderful. https://jp.mathworks.com/products/matlab/matlab-execution-engine.html

bonus

The reflection of the waves is beautiful

c = 1.0;
xmin = 0.;
ymin = 0.;
xmax = 1.;
ymax = 1.; %The calculation area is[0,1],[0,1]
xmesh = 100; 
ymesh = 100; %The number of divisions is x,400 on both y-axis
dx = (xmax-xmin)/xmesh;
dy = (ymax-ymin)/ymesh;
dt = 0.1*dx/c;
u0 = zeros(xmesh,ymesh); % u^{n-1}
u1 = zeros(xmesh,ymesh); % u^n
u2 = zeros(xmesh,ymesh); % u^{n+1}

x = xmin+dx/2:dx:xmax-dx/2;
y = ymin+dy/2:dy:ymax-dy/2;
t=0.;
[X,Y] = meshgrid(x,y);
[smallX,smallY] = meshgrid(linspace(0,pi,21),linspace(0,pi,21));
u1(20:40,30:50)=2e-7*sin(smallX).*sin(smallY);%The initial velocity is given to a certain area.
u1(60:80,40:60)=1e-7*sin(smallX).*sin(smallY);
mesh(X,Y,u1);
zlim([-1e-5 1e-5]);
tic %A convenient guy who can measure time with tic toc
while t<10.0
    u2(2:xmesh-1,2:ymesh-1) = 2*u1(2:xmesh-1,2:ymesh-1)-u0(2:xmesh-1,2:ymesh-1)+c*c*dt*dt*(diff(u1(:,2:ymesh-1),2,1)/(dx*dx)+diff(u1(2:xmesh-1,:),2,2)/(dy*dy));
    u0=u1;
    u1=u2;
    t = t+dt;
    u1(:,1)=0.;
    u1(:,ymesh)=0.;
    u1(1,:)=0.;
    u1(xmesh,:)=0.;
    
    mesh(X,Y,u1);
    zlim([-1e-5 1e-5]);
    pause(0.01);
end
toc
[X,Y] = meshgrid(x,y);
mesh(X,Y,u1)

When executed, the reflection of the waves can be seen beautifully like an animation.

I hear that Matlab and Python loops are slow, but is that true?

Languages that seem to be slow when turning a for loop

Central solution of wave equation

Conclusion

bonus