[PYTHON] Calculation speed of indexing for numpy quadratic array

Consider having a numpy quadratic array and indexing rows and columns as follows: (In the figure below, we are trying to extract [[9, 12], [21, 24]] from a 6x6 secondary array)

image.png

At this time, there was a considerable difference in speed depending on the calculation method!

#Extract random row and column elements from a huge quadratic array
import numpy as np

N = 10000
X = np.arange(N ** 2).reshape(N, N)

M = 100
a = np.random.choice(N, M)
b = np.random.choice(N, M)

%timeit Y1 = X[a][:, b] 
#Execution example) 1.09 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit Y2 = X[a[:, np.newaxis], b]
#Execution example) 66.8 µs ± 1.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Calculation of X [a [:, np.newaxis], b] is overwhelmingly faster than X [a] [:, b].

The following is a snake leg, but since it was difficult to index the quadratic array as shown in the above figure, I will leave the twists and turns up to this point.

For Slice

In this case it's easy

import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)

a = np.s_[0:100]      #Slice about the line
b = np.s_[100:300]    #Slice about columns
#Sliced result
Y = X[a, b]           # Y.shape = (100,200)

Can be expressed as. However, if you try to extract rows and columns more flexibly, there is a limit to slicing.

In case of Indexing (failure example)

If you try to do the same as above, it doesn't actually work.

import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)

a = np.arange(0,100)       #Indexing about rows
b = np.arange(100, 300)    #Indexing on columns
#Indexing result
Y = X[a, b]           #Error occurs

When indexing a two-dimensional array as above, a and b must be arrays of the same length. Also, the following array is returned even if it has the same length.

import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)

a = np.arange(3)      #Indexing about rows
b = np.arange(3)      #Indexing on columns
#Indexing result
Y = X[a, b]           #result: [0, 10001, 20002]

It is just like Y [i] = X [a [i], b [i]].

Indexing (success example)

Since the above method is useless, I was able to index as expected by rewriting as follows.

import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)

a = np.arange(0,100)       #Indexing about rows
b = np.arange(100, 300)    #Indexing on columns
#Indexing result
Y = X[a][:, b]           

However, when I read numpy documentation, I found the following description.

So note that x[0,2] = x[0][2] though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.

In other words, by setting X [a], a temporary array is generated, which seems to be inefficient. In fact, as shown in the code at the beginning, the larger the quadratic array, the more it affected the calculation time.

Again, the final indexing method is as follows.

import numpy as np
N = 10000
X = np.arange(N ** 2).reshape(N, N)

a = np.arange(0,100)       #Indexing about rows
b = np.arange(100, 300)    #Indexing on columns
#Indexing result
Y = X[a[:, np.newaxis], b]           

That's all for me. It is a result of trial and error, so please let me know if there is a more efficient method.

Recommended Posts

Calculation speed of indexing for numpy quadratic array
python numpy array calculation
Multidimensional array calculation without Numpy
Rewrite piecewise of NumPy for CuPy
Multidimensional array calculation without Numpy Part 2
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
The shape of the one-dimensional array of numpy was complicated
I checked the processing speed of numpy one-dimensionalization
Speed: Add element to end of Python array
Calculation of mutual information (continuous value) with numpy
NumPy array manipulation (3)
NumPy array manipulation (1)
Prepared for date calculation and automation of my bot
Installation of dependent libraries for Alibaba Cloud function calculation
Python netCDF4 read speed and nesting of for statements
Convert elements of numpy array from float to int