Flattening the numpy array. I tried to find out how much the speed difference would occur depending on the processing method. The code is running on Jupyter.
reshape
First is the basic reshape
. Let's see the processing speed when creating a 10000 x 10000 array and flattening it.
import numpy as np
np.random.seed(0)
a = np.random.randn(10**4, 10**4)
print(a)
output
[[ 1.76405235e+00 4.00157208e-01 9.78737984e-01 ... 5.16872179e-01
-3.29206931e-02 1.29811143e+00]
[-2.02117027e-01 -8.33231001e-01 1.73360025e+00 ... -5.72592527e-02
-1.05893126e+00 -3.26528442e-01]
[ 3.30045889e-01 -4.79979434e-04 8.18115898e-01 ... -1.51127631e+00
9.76843833e-01 1.25550065e+00]
...
Execute reshape
%%time
b = a.reshape(-1)
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
output
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
CPU times: user 2 ms, sys: 0 ns, total: 2 ms
Wall time: 2.15 ms
If you use the reshape
method as is, it will be passed by reference to the original array ʻa as described above. It is fast without the need to secure a new memory area. However, be aware that the value of ʻa
will also change.
flatten
%%time
b = a.flatten()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
output
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
[1.76405235 0.40015721 0.97873798 2.2408932 1.86755799]
CPU times: user 152 ms, sys: 378 ms, total: 530 ms
Wall time: 532 ms
flatten
passes a copy of the original array as above. So changing the value of b
does not change the original value of ʻa. However, the processing speed is quite different due to the allocation of a new memory area. It cannot be said unconditionally, but looking at the above results alone, the processing speed is nearly 250 times faster than
reshape`.
ravel
%%time
b = a.ravel()
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
output
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
CPU times: user 2.75 ms, sys: 0 ns, total: 2.75 ms
Wall time: 3.36 ms
Like reshape
, ravel
is passed by reference and the processing speed is high.
resize
The resize
method is destructive. So it changes the object itself called ʻa. Moreover, unlike the
reshape method, even if the shape and the number of values do not match, it will change to a good one without an error. In the example below, you can see that the value of ʻa
is being updated steadily.
a.resize(1, 10000*10000)
print(a[:5], '\n')
a.resize(2, 2)
print(a, '\n')
a.resize(3, 3)
print(a, '\n')
a.resize(1)
print(a)
output
[[ 1.76405235 0.40015721 0.97873798 ... 0.32191089 0.25199669
-1.22612391]]
[[1.76405235 0.40015721]
[0.97873798 2.2408932 ]]
[[1.76405235 0.40015721 0.97873798]
[2.2408932 0. 0. ]
[0. 0. 0. ]]
[1.76405235]
If you do not want to change the value before conversion, write as follows.
%%time
b = np.resize(a, (-1, ))
b[0] = 100.
print(b[:5], '\n')
print(a[0, :5], '\n')
print(a.shape, '\n')
print(b.shape, '\n')
output
[100. 0.40015721 0.97873798 2.2408932 1.86755799]
[1.76405235 0.40015721 0.97873798 2.2408932 1.86755799]
(10000, 10000)
(99999999,)
CPU times: user 235 ms, sys: 1.95 ms, total: 237 ms
Wall time: 239 ms
!?
ʻA hasn't changed, but if you look closely, the essential
bvalue is one less. Apparently, if you specify
(-1,)as above, the number of
resizewill be reduced by one? It is like. So, when you want to change the size, it seems that you need to explicitly specify the size like ʻa.resize (1, 10000 * 10000)
in the cell four above.
(Addition)
** resize (-1)
seems to be a specification except for the last value specified as in the comments on this blog.
c = np.arange(6).reshape(2, 3)
d = np.resize(c, (-1,))
print(c, '\n')
print(d)
output
[[0 1 2]
[3 4 5]]
[0 1 2 3 4]
If you want speed, reshape
. If you want to give a copy, I personally think that flatten
will not be inconvenient for now, but I would be grateful if someone familiar with this area could comment.
Recommended Posts