Python netCDF4 read speed and nesting of for statements

If you try to plot meteorological data on different barometric pressure levels over a long period of time, nesting by for statements and reading of netCDF files will naturally increase.

Therefore, I investigated how the time required to obtain an array of data changes by changing the nesting structure of the for statement.

Verification code 1

For the verification code, I referred to here. Data were obtained from NCAR's RDA. The data is a geopotential, a four-dimensional array of [time, pressure plane, latitude, longitude]. The time is 24 hours from 0 to 23, and the atmospheric pressure level is 37.

func_1 reads the data once and stores the 3D array in a. After that, the geopotential at each pressure plane is substituted for b. func_2 reads the data for each barometric pressure plane and obtains a two-dimensional array each time.

check1.py


import timeit
from netCDF4 import Dataset

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][0, :]
    for i in range(len(a)):
        b = a[i, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    lev = nc.variables['level'][:]
    for i in range(len(lev)):
        b = nc.variables['Z'][0, i, :]
    return 0

loop = 20

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)

Result is

1:  0.009774951753206551
2:  0.018051519710570573

It became like. There is a speed difference of twice as much. From this, it was found that the speed of reading the 3D array is superior to the speed of reading the 3D array at one time and the speed of reading the 2D array each time.

Verification code 2

Since the data you want to handle is a 4-dimensional array, check the case of a 4-dimensional array as well. As with verification code 1, func_1 and func_2 are read once by func_1 and read by func_2 each time.

check1.py


import timeit
from netCDF4 import Dataset

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][0, :]
    for i in range(len(a)):
        b = a[i, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    lev = nc.variables['level'][:]
    for i in range(len(lev)):
        b = nc.variables['Z'][0, i, :]
    return 0

loop = 10

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)

Then, the result was as follows.

1:  1.4546271565370261
2:  1.3412013622000813

It turned out that the method of reading an array at once, which was fast in 3D, is slower than the method of reading each time in 4D. Does that mean that if the array becomes multidimensional, the processing speed will be slower than the for statement? It's an interesting result. Also, from now on, I thought that if I took the best of both worlds, it would be a program that could read data fastest, so I compared the speed of the following code.

Verification code 3

I created a new func_3 and compared the speed. func_3 is a function that turns time with a for statement and reads a three-dimensional array [pressure plane, latitude, longitude] for each time.

check3.py


from netCDF4 import Dataset
import timeit

def func_1():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    a = nc.variables['Z'][:]
    for i in range(len(a)):
        b = a[i, :]
        for j in range(len(b)):
            c = b[j, :]
    return 0

def func_2():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    time = nc.variables['time'][:]
    lev = nc.variables['level'][:]
    for j in range(len(time)):
        for i in range(len(lev)):
            b = nc.variables['Z'][j, i, :]
    return 0

def func_3():
    nc = Dataset('../data/era5/Z_2020070100_2020070123.nc', 'r')
    time = nc.variables['time'][:]
    for j in range(len(time)):
        a = nc.variables['Z'][j, :]
        for i in range(len(a)):
            b = a[i, :]
    return 0


loop = 10

result_1 = timeit.timeit(lambda: func_1(), number=loop)
result_2 = timeit.timeit(lambda: func_2(), number=loop)
result_3 = timeit.timeit(lambda: func_3(), number=loop)

print('1: ', result_1 / loop)
print('2: ', result_2 / loop)
print('3: ', result_3 / loop)

The result is as follows.

1:  1.4101094176992774
2:  1.344068780587986
3:  1.0753227178938687

As you might expect, func_3 was the fastest.

Conclusion

What we learned from the three verifications

--It is faster to read up to the 3D array [pressure plane, latitude, longitude] all at once. --It is faster to treat a 4-dimensional array [time, pressure plane, latitude, longitude] as a 3-dimensional array by turning it with a for statement.

That's what it means. Visualization of meteorological data is a time-consuming task. We hope this verification will help you save time.

Recommended Posts

Python netCDF4 read speed and nesting of for statements
[Python] Output battles and combinations (nesting for statements and if statements)
Summary of various for statements in Python
I measured the speed of list comprehension, for and while with python2.7.
Compare the speed of Python append and map
List of Python libraries for data scientists and data engineers
What to use for Python stacks and queues (speed comparison of each data structure)
Compare read / write speed and capacity of csv, pickle, joblib, parquet in python environment
Source installation and installation of Python
Tips for coding short and easy to read in Python
Useful tricks related to list and for statements in Python
Summary of Hash (Dictionary) operation support for Ruby and Python
Environment construction of python and opencv
I compared the speed of Hash with Topaz, Ruby and Python
The story of Python and the story of NaN
Speed comparison of Wiktionary full text processing with F # and Python
Build and test a CI environment for multiple versions of Python
Installation of SciPy and matplotlib (Python)
Read and use Python files from Python
This and that of python properties
[TouchDesigner] Tips for for statements using python
Reading and writing NetCDF with Python
Speed comparison of Python XML parsing
Introductory table of contents for python3
Coexistence of Python2 and 3 with CircleCI (1.0)
Record of Python introduction for newcomers
Summary of Python indexes and slices
Create and read messagepacks in Python
Reputation of Python books and reference books
I replaced the numerical calculation of Python with Rust and compared the speed
[Python of Hikari-] Chapter 05-09 Control syntax (use of for statement and while statement properly)
Explanation of creating an application for displaying images and drawing with Python
Example of python code for exponential distribution and maximum likelihood estimation (MLE)
Installation of Visual studio code and installation of python
[Python] Minutes of study meeting for beginners (7/15)
difference between statements (statements) and expressions (expressions) in Python
Extraction of tweet.js (json.loads and eval) (Python)
6 Python libraries for faster development and debugging
Connect a lot of Python or and and
[Python] Organizing how to use for statements
Read Python csv and export to txt
Read and write JSON files in Python
Pandas of the beginner, by the beginner, for the beginner [Python]
[Python] Read images with OpenCV (for beginners)
Summary of useful techniques for Python Scrapy
Easy introduction of python3 series and OpenCV3
[Python] Various combinations of strings and values
Idempotent automation of Python and PyPI setup
[python] Read html file and practice scraping
Full understanding of Python threading and multiprocessing
SublimeText2 and SublimeLinter --Syntax check for Python3--
Project Euler # 1 "Multiples of 3 and 5" in Python
Speed comparison of murmurhash3, md5 and sha1
[Python] Create a list of date and time (datetime type) for a certain period
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
Memo # 4 for Python beginners to read "Detailed Python Grammar"
[Python] Read the source code of Bottle Part 2
Correspondence summary of array operation of ruby and python
Instant method grammar for Python and Ruby (studying)
The story of low learning costs for Python