If you try to plot meteorological data on different barometric pressure levels over a long period of time, nesting by for statements and reading of netCDF files will naturally increase.
Therefore, I investigated how the time required to obtain an array of data changes by changing the nesting structure of the for statement.
For the verification code, I referred to here. Data were obtained from NCAR's RDA. The data is a geopotential, a four-dimensional array of [time, pressure plane, latitude, longitude]. The time is 24 hours from 0 to 23, and the atmospheric pressure level is 37.
func_1
reads the data once and stores the 3D array in a. After that, the geopotential at each pressure plane is substituted for b.
func_2
reads the data for each barometric pressure plane and obtains a two-dimensional array each time.
check1.py
import timeit
from netCDF4 import Dataset
def func_1():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
a = nc.variables['Z'][0, :]
for i in range(len(a)):
b = a[i, :]
return 0
def func_2():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
lev = nc.variables['level'][:]
for i in range(len(lev)):
b = nc.variables['Z'][0, i, :]
return 0
loop = 20
result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)
print('1: ', result_1 / loop)
print('2: ', result_2 / loop)
Result is
1: 0.009774951753206551
2: 0.018051519710570573
It became like. There is a speed difference of twice as much. From this, it was found that the speed of reading the 3D array is superior to the speed of reading the 3D array at one time and the speed of reading the 2D array each time.
Since the data you want to handle is a 4-dimensional array, check the case of a 4-dimensional array as well. As with verification code 1, func_1
and func_2
are read once by func_1
and read by func_2
each time.
check1.py
import timeit
from netCDF4 import Dataset
def func_1():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
a = nc.variables['Z'][0, :]
for i in range(len(a)):
b = a[i, :]
return 0
def func_2():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
lev = nc.variables['level'][:]
for i in range(len(lev)):
b = nc.variables['Z'][0, i, :]
return 0
loop = 10
result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)
print('1: ', result_1 / loop)
print('2: ', result_2 / loop)
Then, the result was as follows.
1: 1.4546271565370261
2: 1.3412013622000813
It turned out that the method of reading an array at once, which was fast in 3D, is slower than the method of reading each time in 4D. Does that mean that if the array becomes multidimensional, the processing speed will be slower than the for statement? It's an interesting result. Also, from now on, I thought that if I took the best of both worlds, it would be a program that could read data fastest, so I compared the speed of the following code.
I created a new func_3
and compared the speed. func_3
is a function that turns time with a for statement and reads a three-dimensional array [pressure plane, latitude, longitude] for each time.
check3.py
from netCDF4 import Dataset
import timeit
def func_1():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
a = nc.variables['Z'][:]
for i in range(len(a)):
b = a[i, :]
for j in range(len(b)):
c = b[j, :]
return 0
def func_2():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
time = nc.variables['time'][:]
lev = nc.variables['level'][:]
for j in range(len(time)):
for i in range(len(lev)):
b = nc.variables['Z'][j, i, :]
return 0
def func_3():
nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
time = nc.variables['time'][:]
for j in range(len(time)):
a = nc.variables['Z'][j, :]
for i in range(len(a)):
b = a[i, :]
return 0
loop = 10
result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)
result_3 = timeit.timeit(lambda: func_3(), number=loop)
print('1: ', result_1 / loop)
print('2: ', result_2 / loop)
print('3: ', result_3 / loop)
The result is as follows.
1: 1.4101094176992774
2: 1.344068780587986
3: 1.0753227178938687
As you might expect, func_3
was the fastest.
What we learned from the three verifications
--It is faster to read up to the 3D array [pressure plane, latitude, longitude] all at once. --It is faster to treat a 4-dimensional array [time, pressure plane, latitude, longitude] as a 3-dimensional array by turning it with a for statement.
That's what it means. Visualization of meteorological data is a time-consuming task. We hope this verification will help you save time.