Consider having a numpy quadratic array and indexing rows and columns as follows: (In the figure below, we are trying to extract [[9, 12], [21, 24]] from a 6x6 secondary array)
At this time, there was a considerable difference in speed depending on the calculation method!
#Extract random row and column elements from a huge quadratic array
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
M = 100
a = np.random.choice(N, M)
b = np.random.choice(N, M)
%timeit Y1 = X[a][:, b]
#Execution example) 1.09 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit Y2 = X[a[:, np.newaxis], b]
#Execution example) 66.8 µs ± 1.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Calculation of X [a [:, np.newaxis], b]
is overwhelmingly faster than X [a] [:, b]
.
The following is a snake leg, but since it was difficult to index the quadratic array as shown in the above figure, I will leave the twists and turns up to this point.
In this case it's easy
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
a = np.s_[0:100] #Slice about the line
b = np.s_[100:300] #Slice about columns
#Sliced result
Y = X[a, b] # Y.shape = (100,200)
Can be expressed as. However, if you try to extract rows and columns more flexibly, there is a limit to slicing.
If you try to do the same as above, it doesn't actually work.
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
a = np.arange(0,100) #Indexing about rows
b = np.arange(100, 300) #Indexing on columns
#Indexing result
Y = X[a, b] #Error occurs
When indexing a two-dimensional array as above, a and b must be arrays of the same length. Also, the following array is returned even if it has the same length.
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
a = np.arange(3) #Indexing about rows
b = np.arange(3) #Indexing on columns
#Indexing result
Y = X[a, b] #result: [0, 10001, 20002]
It is just like Y [i] = X [a [i], b [i]]
.
Since the above method is useless, I was able to index as expected by rewriting as follows.
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
a = np.arange(0,100) #Indexing about rows
b = np.arange(100, 300) #Indexing on columns
#Indexing result
Y = X[a][:, b]
However, when I read numpy documentation, I found the following description.
So note that x[0,2] = x[0][2] though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.
In other words, by setting X [a]
, a temporary array is generated, which seems to be inefficient.
In fact, as shown in the code at the beginning, the larger the quadratic array, the more it affected the calculation time.
Again, the final indexing method is as follows.
import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)
a = np.arange(0,100) #Indexing about rows
b = np.arange(100, 300) #Indexing on columns
#Indexing result
Y = X[a[:, np.newaxis], b]
That's all for me. It is a result of trial and error, so please let me know if there is a more efficient method.
Recommended Posts