[PYTHON] I checked the processing speed of numpy one-dimensionalization

Blog motive

Flattening the numpy array. I tried to find out how much the speed difference would occur depending on the processing method. The code is running on Jupyter.

reshape First is the basic reshape. Let's see the processing speed when creating a 10000 x 10000 array and flattening it.

import numpy as np

np.random.seed(0)
a = np.random.randn(10**4, 10**4)
print(a)

output

[[ 1.76405235e+00  4.00157208e-01  9.78737984e-01 ...  5.16872179e-01
  -3.29206931e-02  1.29811143e+00]
 [-2.02117027e-01 -8.33231001e-01  1.73360025e+00 ... -5.72592527e-02
  -1.05893126e+00 -3.26528442e-01]
 [ 3.30045889e-01 -4.79979434e-04  8.18115898e-01 ... -1.51127631e+00
   9.76843833e-01  1.25550065e+00]
 ...

Execute reshape

%%time

b = a.reshape(-1)
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 2.15 ms

If you use the reshape method as is, it will be passed by reference to the original array ʻa as described above. It is fast without the need to secure a new memory area. However, be aware that the value of ʻa will also change.

flatten

%%time

b = a.flatten()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[1.76405235 0.40015721 0.97873798 2.2408932  1.86755799] 

CPU times: user 152 ms, sys: 378 ms, total: 530 ms
Wall time: 532 ms

flatten passes a copy of the original array as above. So changing the value of b does not change the original value of ʻa. However, the processing speed is quite different due to the allocation of a new memory area. It cannot be said unconditionally, but looking at the above results alone, the processing speed is nearly 250 times faster than reshape`.

ravel

%%time

b = a.ravel()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

CPU times: user 2.75 ms, sys: 0 ns, total: 2.75 ms
Wall time: 3.36 ms

Like reshape, ravel is passed by reference and the processing speed is high.

resize

The resize method is destructive. So it changes the object itself called ʻa. Moreover, unlike the reshape method, even if the shape and the number of values do not match, it will change to a good one without an error. In the example below, you can see that the value of ʻa is being updated steadily.

a.resize(1, 10000*10000)
print(a[:5], '\n')
a.resize(2, 2)
print(a, '\n')
a.resize(3, 3)
print(a, '\n')
a.resize(1)
print(a)

output

[[ 1.76405235  0.40015721  0.97873798 ...  0.32191089  0.25199669
  -1.22612391]] 

[[1.76405235 0.40015721]
 [0.97873798 2.2408932 ]] 

[[1.76405235 0.40015721 0.97873798]
 [2.2408932  0.         0.        ]
 [0.         0.         0.        ]] 

[1.76405235]

If you do not want to change the value before conversion, write as follows.

%%time

b = np.resize(a, (-1, ))
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
print(a.shape, '\n')
print(b.shape,  '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[1.76405235 0.40015721 0.97873798 2.2408932  1.86755799] 

(10000, 10000) 

(99999999,) 

CPU times: user 235 ms, sys: 1.95 ms, total: 237 ms
Wall time: 239 ms

!?

ʻA hasn't changed, but if you look closely, the essential bvalue is one less. Apparently, if you specify(-1,)as above, the number ofresizewill be reduced by one? It is like. So, when you want to change the size, it seems that you need to explicitly specify the size like ʻa.resize (1, 10000 * 10000)in the cell four above.

(Addition) ** resize (-1) seems to be a specification except for the last value specified as in the comments on this blog.

c = np.arange(6).reshape(2, 3)
d = np.resize(c, (-1,))
print(c, '\n')
print(d)

output

[[0 1 2]
 [3 4 5]] 

[0 1 2 3 4]

Afterword

If you want speed, reshape. If you want to give a copy, I personally think that flatten will not be inconvenient for now, but I would be grateful if someone familiar with this area could comment.

Recommended Posts

I checked the processing speed of numpy one-dimensionalization
I checked the contents of docker volume
I checked the options of copyMakeBorder of OpenCV
About the processing speed of SVM (SVC) of scikit-learn
I checked the list of shortcut keys of Jupyter
I checked the session retention period of django
I checked the output specifications of PyTorch's Bidirectional LSTM
I checked out the versions of Blender and Python
I checked the default OS and shell of docker-machine
python3 Measure the processing speed.
I tried to compare the processing speed with dplyr of R and pandas of Python
How to increase the processing speed of vertex position acquisition
I want to judge the authenticity of the elements of numpy array
I wrote the basic operation of Numpy in Jupyter Lab.
I investigated the mechanism of flask-login!
I checked the gift tax amount
I compared the speed of Hash with Topaz, Ruby and Python
I checked the distribution of the number of video views of "Flag-chan!" [Python] [Graph]
I checked the usage status of the parking lot from satellite images.
I checked the image of Science University on Twitter with Word2Vec.
I tried the asynchronous server of Django 3.0
I tried to get the batting results of Hachinai using image processing
I summarized the folder structure of Flask
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
Summary of numpy functions I didn't know
The Python project template I think of.
I checked the number of closed and opened stores nationwide by Corona
I read the implementation of golang channel
The story of blackjack A processing (python)
I measured the speed of list comprehension, for and while with python2.7.
Python: I want to measure the processing time of a function neatly
I compared the speed of go language web framework echo and python web framework flask
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 05-09]
I tried the pivot table function of pandas
I checked the library for using the Gracenote API
I tried cluster analysis of the weather map
I read the implementation of range (Objects / rangeobject.c)
I solved the deepest problem of Hiroshi Yuki.
View the result of geometry processing in Python
I tried to touch the API of ebay
I tried to correct the keystone of the image
Compare the speed of Python append and map
Image processing? The story of starting Python for
The shape of the one-dimensional array of numpy was complicated
I touched some of the new features of Python 3.8 ①
[Python numpy] Dynamically specify the index of the array
Calculation speed of indexing for numpy quadratic array
I read and implemented the Variants of UKR
About the behavior of Queue during parallel processing
I want to customize the appearance of zabbix
I checked the calendar deleted in Qiita Advent Calendar 2016
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I checked the reference speed when using python list, dictionary, and set type in.
I compared the speed of the reference of the python in list and the reference of the dictionary comprehension made from the in list.
[Trainer's Recipe] I touched the flame of the Python framework.
I want to grep the execution result of strace
I tried to summarize the basic form of GPLVM