Blog motive

Flattening the numpy array. I tried to find out how much the speed difference would occur depending on the processing method. The code is running on Jupyter.

reshape First is the basic reshape. Let's see the processing speed when creating a 10000 x 10000 array and flattening it.

import numpy as np

np.random.seed(0)
a = np.random.randn(10**4, 10**4)
print(a)

output

[[ 1.76405235e+00  4.00157208e-01  9.78737984e-01 ...  5.16872179e-01
  -3.29206931e-02  1.29811143e+00]
 [-2.02117027e-01 -8.33231001e-01  1.73360025e+00 ... -5.72592527e-02
  -1.05893126e+00 -3.26528442e-01]
 [ 3.30045889e-01 -4.79979434e-04  8.18115898e-01 ... -1.51127631e+00
   9.76843833e-01  1.25550065e+00]
 ...

Execute reshape

%%time

b = a.reshape(-1)
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 2.15 ms

If you use the reshape method as is, it will be passed by reference to the original array ʻa as described above. It is fast without the need to secure a new memory area. However, be aware that the value of ʻa will also change.

flatten

%%time

b = a.flatten()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[1.76405235 0.40015721 0.97873798 2.2408932  1.86755799] 

CPU times: user 152 ms, sys: 378 ms, total: 530 ms
Wall time: 532 ms

flatten passes a copy of the original array as above. So changing the value of b does not change the original value of ʻa. However, the processing speed is quite different due to the allocation of a new memory area. It cannot be said unconditionally, but looking at the above results alone, the processing speed is nearly 250 times faster than reshape`.

ravel

%%time

b = a.ravel()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

CPU times: user 2.75 ms, sys: 0 ns, total: 2.75 ms
Wall time: 3.36 ms

Like reshape, ravel is passed by reference and the processing speed is high.

resize

The resize method is destructive. So it changes the object itself called ʻa. Moreover, unlike the reshape method, even if the shape and the number of values do not match, it will change to a good one without an error. In the example below, you can see that the value of ʻa is being updated steadily.

a.resize(1, 10000*10000)
print(a[:5], '\n')
a.resize(2, 2)
print(a, '\n')
a.resize(3, 3)
print(a, '\n')
a.resize(1)
print(a)

output

[[ 1.76405235  0.40015721  0.97873798 ...  0.32191089  0.25199669
  -1.22612391]] 

[[1.76405235 0.40015721]
 [0.97873798 2.2408932 ]] 

[[1.76405235 0.40015721 0.97873798]
 [2.2408932  0.         0.        ]
 [0.         0.         0.        ]] 

[1.76405235]

If you do not want to change the value before conversion, write as follows.

%%time

b = np.resize(a, (-1, ))
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
print(a.shape, '\n')
print(b.shape,  '\n')

output

[100.           0.40015721   0.97873798   2.2408932    1.86755799] 

[1.76405235 0.40015721 0.97873798 2.2408932  1.86755799] 

(10000, 10000) 

(99999999,) 

CPU times: user 235 ms, sys: 1.95 ms, total: 237 ms
Wall time: 239 ms

！？

ʻA hasn't changed, but if you look closely, the essential bvalue is one less. Apparently, if you specify(-1,)as above, the number ofresizewill be reduced by one? It is like. So, when you want to change the size, it seems that you need to explicitly specify the size like ʻa.resize (1, 10000 * 10000)in the cell four above.

(Addition) ** resize (-1) seems to be a specification except for the last value specified as in the comments on this blog.

c = np.arange(6).reshape(2, 3)
d = np.resize(c, (-1,))
print(c, '\n')
print(d)

output

[[0 1 2]
 [3 4 5]] 

[0 1 2 3 4]

Afterword

If you want speed, reshape. If you want to give a copy, I personally think that flatten will not be inconvenient for now, but I would be grateful if someone familiar with this area could comment.

[PYTHON] I checked the processing speed of numpy one-dimensionalization

Blog motive

Afterword