If you try to plot meteorological data on different barometric pressure levels over a long period of time, nesting by for statements and reading of netCDF files will naturally increase.

Therefore, I investigated how the time required to obtain an array of data changes by changing the nesting structure of the for statement.

Verification code 1

For the verification code, I referred to here. Data were obtained from NCAR's RDA. The data is a geopotential, a four-dimensional array of [time, pressure plane, latitude, longitude]. The time is 24 hours from 0 to 23, and the atmospheric pressure level is 37.

func_1 reads the data once and stores the 3D array in a. After that, the geopotential at each pressure plane is substituted for b. func_2 reads the data for each barometric pressure plane and obtains a two-dimensional array each time.

`check1.py`


import timeit
from netCDF4 import Dataset

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][0, :]
    for i in range(len(a)):
        b = a[i, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    lev = nc.variables['level'][:]
    for i in range(len(lev)):
        b = nc.variables['Z'][0, i, :]
    return 0

loop = 20

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)

Result is

1:  0.009774951753206551
2:  0.018051519710570573

It became like. There is a speed difference of twice as much. From this, it was found that the speed of reading the 3D array is superior to the speed of reading the 3D array at one time and the speed of reading the 2D array each time.

Verification code 2

Since the data you want to handle is a 4-dimensional array, check the case of a 4-dimensional array as well. As with verification code 1, func_1 and func_2 are read once by func_1 and read by func_2 each time.

`check1.py`


import timeit
from netCDF4 import Dataset

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][0, :]
    for i in range(len(a)):
        b = a[i, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    lev = nc.variables['level'][:]
    for i in range(len(lev)):
        b = nc.variables['Z'][0, i, :]
    return 0

loop = 10

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)

Then, the result was as follows.

1:  1.4546271565370261
2:  1.3412013622000813

It turned out that the method of reading an array at once, which was fast in 3D, is slower than the method of reading each time in 4D. Does that mean that if the array becomes multidimensional, the processing speed will be slower than the for statement? It's an interesting result. Also, from now on, I thought that if I took the best of both worlds, it would be a program that could read data fastest, so I compared the speed of the following code.

Verification code 3

I created a new func_3 and compared the speed. func_3 is a function that turns time with a for statement and reads a three-dimensional array [pressure plane, latitude, longitude] for each time.

`check3.py`


from netCDF4 import Dataset
import timeit

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][:]
    for i in range(len(a)):
        b = a[i, :]
        for j in range(len(b)):
            c = b[j, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    time = nc.variables['time'][:]
    lev = nc.variables['level'][:]
    for j in range(len(time)):
        for i in range(len(lev)):
            b = nc.variables['Z'][j, i, :]
    return 0

def func_3():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    time = nc.variables['time'][:]
    for j in range(len(time)):
        a = nc.variables['Z'][j, :]
        for i in range(len(a)):
            b = a[i, :]
    return 0


loop = 10

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)
result_3 = timeit.timeit(lambda: func_3(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)
print('3: ', result_3 / loop)

The result is as follows.

1:  1.4101094176992774
2:  1.344068780587986
3:  1.0753227178938687

As you might expect, func_3 was the fastest.

Conclusion

What we learned from the three verifications

--It is faster to read up to the 3D array [pressure plane, latitude, longitude] all at once. --It is faster to treat a 4-dimensional array [time, pressure plane, latitude, longitude] as a 3-dimensional array by turning it with a for statement.

That's what it means. Visualization of meteorological data is a time-consuming task. We hope this verification will help you save time.

Python netCDF4 read speed and nesting of for statements

Verification code 1

check1.py

Verification code 2

check1.py

Verification code 3

check3.py

Conclusion

`check1.py`

`check1.py`

`check3.py`