[Python] Summary of array generation (initialization) time! !! !!

Summary! !! !!

Let's summarize the results first. It's the so-called "for those who don't have time". I am proud that it is fairly easy to see and organized.

1d array

Code Mean Stdev. Runs Loops
np.empty(N) 1.76 µs 32 ns 7 runs 1000000
np.zeros(N) 11.3 ms 453 µs 7 runs 100
np.ones(N) 14.9 ms 182 µs 7 runs 100
np.zeros_like(np.empty(N)) 19.9 ms 320 µs 7 runs 100
np.ones_like(np.empty(N)) 20.8 ms 931 µs 7 runs 10
[None] * N 35.9 ms 418 µs 7 runs 10
[0] * N 35.2 ms 489 µs 7 runs 10
[None for i in range(N)] 417 ms 77.7 ms 7 runs 1
[0 for i in range(N)] 375 ms 15.1 ms 7 runs 1

2d array

Code Mean Stdev. Runs Loops
np.empty([M,M]) 3.58 µs 157 ns 7 runs 100000
np.zeros([M,M]) 3.66 µs 51.5 ns 7 runs 100000
[[None] * M] * M → Addendum 1 37.4 µs 1.08 µs 7 runs 10000
[[0] * M] * M → Addendum 1 37.4 µs 388 ns 7 runs 10000
np.ones([M,M]) 378 ms 5.46 ms 7 runs 1
np.zeros_like(np.empty([M,M])) 375 ms 3.22 ms 7 runs 1
np.ones_like(np.empty([M,M])) 384 ms 7.62 ms 7 runs 1
[[None for j in range(M)] for i in range(M)] 3.83 s 37.5 ms 7 runs 1
[[0 for j in range(M)] for i in range(M)] 3.86 s 61.6 ms 7 runs 1

Now let's look at these details.

0 preparation

The version is Python 3.8.6.

Type confirmation.

type(None), type(0)
(NoneType, int)

Set the appropriate parameters.

N = int(1e7)
M = int(1e4)

Import is also measured for the time being.

%%timeit
import numpy as np
#112 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

This is nanoseconds so you don't have to consider it (remove %% timeit when importing).

1 run

1d array

%timeit np.empty(N)
%timeit np.zeros(N)
%timeit np.ones(N)
%timeit np.zeros_like(np.empty(N))
%timeit np.ones_like(np.empty(N))
%timeit [None] * N
%timeit [0] * N
%timeit [None for i in range(N)]
%timeit [0 for i in range(N)]
'''
1.76 µs ± 32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
11.3 ms ± 453 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14.9 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
19.9 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.8 ms ± 931 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
35.9 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
35.2 ms ± 489 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
417 ms ± 77.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
375 ms ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'''

np.empty (N) is extremely fast. However, note that the speed is not constant because what is output varies from time to time. It can be as high as 100 μs, but it is still fast. Also, if 0 and None are executed in different order or executed in independent cells with %% timeit, the speed may be reversed. This means that it can be regarded as almost the same speed regardless of which one is used.

2d array

%timeit np.empty([M,M])
%timeit np.zeros([M,M])
%timeit np.ones([M,M])
%timeit np.zeros_like(np.empty([M,M]))
%timeit np.ones_like(np.empty([M,M]))
%timeit [[None] * M] * M
%timeit [[0] * M] * M
%timeit [[None for j in range(M)] for i in range(M)]
%timeit [[0 for j in range(M)] for i in range(M)]
'''
3.58 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.66 µs ± 51.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
378 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
375 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
384 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
37.4 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
37.4 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.83 s ± 37.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.86 s ± 61.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'''

After all, np.empty ([M, M]) is fast, but np.zeros ([M, M]) is not defeated either. Surprisingly, multiplication is also a good fight (→ Addition 1). Needless to say, the for statement is slow, and although you may still be able to devise a way to write it, it cannot be used for initialization.

2 Results (summarized in a table)

1d array

Code Mean Stdev. Runs Loops
np.empty(N) 1.76 µs 32 ns 7 runs 1000000
np.zeros(N) 11.3 ms 453 µs 7 runs 100
np.ones(N) 14.9 ms 182 µs 7 runs 100
np.zeros_like(np.empty(N)) 19.9 ms 320 µs 7 runs 100
np.ones_like(np.empty(N)) 20.8 ms 931 µs 7 runs 10
[None] * N 35.9 ms 418 µs 7 runs 10
[0] * N 35.2 ms 489 µs 7 runs 10
[None for i in range(N)] 417 ms 77.7 ms 7 runs 1
[0 for i in range(N)] 375 ms 15.1 ms 7 runs 1

2d array

Code Mean Stdev. Runs Loops
np.empty([M,M]) 3.58 µs 157 ns 7 runs 100000
np.zeros([M,M]) 3.66 µs 51.5 ns 7 runs 100000
[[None] * M] * M → Addendum 1 37.4 µs 1.08 µs 7 runs 10000
[[0] * M] * M → Addendum 1 37.4 µs 388 ns 7 runs 10000
np.ones([M,M]) 378 ms 5.46 ms 7 runs 1
np.zeros_like(np.empty([M,M])) 375 ms 3.22 ms 7 runs 1
np.ones_like(np.empty([M,M])) 384 ms 7.62 ms 7 runs 1
[[None for j in range(M)] for i in range(M)] 3.83 s 37.5 ms 7 runs 1
[[0 for j in range(M)] for i in range(M)] 3.86 s 61.6 ms 7 runs 1

3 Conclusion

Np.empty () is fast for array generation only, and np.zeros () is fast for initialization.

4 Bonus

np.empty_like is also fast.

%timeit np.empty_like(np.empty(N))
%timeit np.empty_like(np.empty([M,M]))
'''
3.24 µs ± 59.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.1 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'''

Addendum 1

Added because there was an indication about [[None] * M] * M, [[0] * M] * M. As it is not suitable as a 2d array as it is, it is rewritten in the inclusion notation.

M = int(1e4)
%timeit [None] * M 
%timeit [0] * M
%timeit [[None] * M] * M
%timeit [[0] * M] * M
%timeit [None] * M * M   #I set it aside for comparison
%timeit [0] * M * M    #I set it aside for comparison
%timeit [[None] * M for i in range(M)]  #for comprehension
%timeit [[0] * M for i in range(M)]  #for comprehension

'''
18.4 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.4 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
37.6 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
37.3 µs ± 661 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
725 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
709 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
672 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
671 ms ± 7.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'''

Inclusive notation will generate a two-dimensional array without any problems, but it's not fast, so if you just want to calculate numbers, you probably won't use it. It seems better to use NumPy (at least for Python 3.8.6). I don't think I'll ever use this again, but I put the multiplication in np.array and experimented.

x = np.array([[0] * 3] * 3)
print(x)
x[0][0] = 1
x
'''
[[0 0 0]
 [0 0 0]
 [0 0 0]]
output:
array([[1, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])
'''

(It looks like it's made in 2D. It wasn't possible with np.matrix. Also, it was made into a 2D matrix with sympy.Matrix, but it takes too long)

%timeit [[None] * M for i in range(M)]
%timeit np.array([[None] * M] * M)

%timeit [[0] * M for i in range(M)]
%timeit np.array([[0] * M] * M)

'''
716 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.06 s ± 65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
682 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
6.29 s ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'''

Since it was even slower, it deviates from the purpose of this article, but an interesting result was obtained as a comparative study.

Addendum 2

The above assumption was for a large size such as creating a matrix with a size of 1000 x 1000. What if you generate an array of length 2 that doesn't need to be initialized? I thought, so I added it. Multiplication of [None] is faster for short arrays.

value = np.empty(2)
%timeit -r 3 -n 10000 [None]*len(value)
%timeit -r 3 -n 10000 np.empty(len(value))
'''
221 ns ± 18.3 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
1.04 µs ± 132 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
'''

I am wondering where it will be replaced. I looked it up.

value = np.empty(300)
%timeit -r 3 -n 10000 [None]*len(value)
%timeit -r 3 -n 10000 np.empty(len(value))
'''
1.01 µs ± 66.3 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
1.06 µs ± 118 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
'''
value = np.empty(400)
%timeit -r 3 -n 10000 [None]*len(value)
%timeit -r 3 -n 10000 np.empty(len(value))
'''
1.27 µs ± 94.3 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
956 ns ± 26.6 ns per loop (mean ± std. dev. of 3 runs, 10000 loops each)
'''

It seems that they will be replaced after about 300. It would be interesting to graph the N dependence, including in other cases.

Reference article

numpy.empty — NumPy v1.19 Manual Notes: empty, unlike zeros, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution. a. [Introduction to NumPy np.empty] Create a new array without initializing elements | Samurai Blog --Site for programming beginners [Introduction to NumPy] Array generation method (1D, 2D, speedup, etc.) | Nishizumi Kobo Generate an array ndarray with all elements initialized with the same value with NumPy | note.nkmk.me Empty and empty_like to generate an empty array ndarray with NumPy | note.nkmk.me How to use the numpy.empty function to generate an uninitialized array --DeepAge

Recommended Posts

[Python] Summary of array generation (initialization) time! !! !!
Summary of Python arguments
Correspondence summary of array operation of ruby and python
Summary of Python3 list operations
Multidimensional array initialization of list
A brief summary of Python collections
Summary of Python indexes and slices
[OpenCV; Python] Summary of findcontours function
Python Summary
Python summary
Python> link> 2D array initialization and assignment
Summary of various for statements in Python
[Python] Accelerates loading of time series CSV
[Python2.7] Summary of how to use unittest
Summary of built-in methods in Python list
Summary of useful techniques for Python Scrapy
Summary of vtkThreshold (updated from time to time)
Summary of how to use Python list
[Python2.7] Summary of how to use subprocess
Axis option specification summary of Python "numpy.sum (...)"
Summary of gcc options (updated from time to time)
Time variation analysis of black holes using python
First time python
Introduction of Python
[Python] Summary of functions that return the index that takes the closest value in the array
Summary of the differences between PHP and Python
Summary of how to import files in Python 3
Python tutorial summary
Python multidimensional array
[Beginner] Python array
python time measurement
Summary of how to use MNIST in Python
Installation of Python3 and Flask [Environment construction summary]
Basics of Python ①
Basics of python ①
Various settings of Python static blog generation tool'Pelican'
First time python
Copy of python
[Python] Summary of S3 file operations with boto3
[Python numpy] Dynamically specify the index of the array
Summary of frequently used Python arrays (for myself)
Output in the form of a python array
Python array basics
Speed: Add element to end of Python array
Summary of studying Python to use AWS Lambda
python related summary
At the time of python update on ubuntu
I / O related summary of python and fortran
Python basics summary
Introduction of Python
Summary of Excel operations using OpenPyXL in Python
[Language processing 100 knocks 2020] Summary of answer examples by Python
Summary of tools needed to analyze data in Python
Machine learning python code summary (updated from time to time)
Summary of Python sort (list, dictionary type, Series, DataFrame)
Summary of Python articles by pharmaceutical company researcher Yukiya
Python --Explanation and usage summary of the top 24 packages
[Python] Type Error: Summary of error causes and remedies for'None Type'
Summary of python environment settings for myself [mac] [ubuntu]
Summary of tools for operating Windows GUI with Python
[Python] Manipulation of elements in list (array) [Add / Delete]