[PYTHON] Find the smallest index that meets the cumulative sum threshold with numpy

I use it a lot, but I often forget it, so make a note

Conclusion first

```argmax () `` It works well when used.

Thing you want to do

The title may be difficult to understand, so explain what you want to do in detail. For numpy.array, you may want to calculate the cumulative sum ( cumsum) and find the smallest index that exceeds the threshold for the cumulative sum. For example

In [143]: score
Out[143]: 
array([[  1.05262936e-01,   1.05262936e-01,   1.05262936e-01,
          1.05262936e-01,   1.05262936e-01,   1.57894404e-01,
          1.05262936e-01,   1.05262936e-01,   2.10525873e-06,
          1.05262936e-01],
       [  1.66664722e-01,   1.66664722e-06,   1.66664722e-01,
          2.49997083e-01,   1.66664722e-06,   2.49997083e-01,
          1.66664722e-01,   3.33329444e-06,   3.33329444e-06,
          1.66664722e-06],
       [  9.09090909e-02,   9.09090909e-02,   9.09090909e-02,
          9.09090909e-02,   9.09090909e-02,   1.21212121e-01,
          1.21212121e-01,   9.09090909e-02,   9.09090909e-02,
          1.21212121e-01]])

In [144]: score.shape
Out[144]: (3, 10)

If there is a `ʻarray`` like this, the cumulative sum of this

In [145]: score.cumsum(1)
Out[145]: 
array([[ 0.10526294,  0.21052587,  0.31578881,  0.42105175,  0.52631468,
         0.68420909,  0.78947202,  0.89473496,  0.89473706,  1.        ],
       [ 0.16666472,  0.16666639,  0.33333111,  0.58332819,  0.58332986,
         0.83332694,  0.99999167,  0.999995  ,  0.99999833,  1.        ],
       [ 0.09090909,  0.18181818,  0.27272727,  0.36363636,  0.45454545,
         0.57575758,  0.6969697 ,  0.78787879,  0.87878788,  1.        ]])

Are thresholds

In [149]: threshold = np.random.random((3, 1))

In [150]: threshold
Out[150]: 
array([[ 0.62732896],
       [ 0.46494853],
       [ 0.54341381]])

I want the smallest index among the larger elements. In this example, it will be score [0, 5], score [1, 3], score [2, 5], so if you get the result of [5, 3, 5] Good.

So, returning to the first conclusion, using ```argmaxof numpy``, I got the output I wanted for a good salt plum.

In [155]: score.cumsum(1) > threshold
Out[155]: 
array([[False, False, False, False, False,  True,  True,  True,  True,
         True],
       [False, False, False,  True,  True,  True,  True,  True,  True,
         True],
       [False, False, False, False, False,  True,  True,  True,  True,
         True]], dtype=bool)

In [156]: np.argmax(score.cumsum(1) > threshold, 1)
Out[156]: array([5, 3, 5], dtype=int64)

`ʻargmax`` seems to return the smallest index among the indexes that take the maximum value.

However. .. ..

On the contrary, it was impossible to find the maximum index among the elements below the threshold. I want [4, 3, 4]

In [176]: score.cumsum(1) < threshold
Out[176]: 
array([[ True,  True,  True,  True,  True, False, False, False, False,
        False],
       [ True,  True,  True, False, False, False, False, False, False,
        False],
       [ True,  True,  True,  True,  True, False, False, False, False,
        False]], dtype=bool)

In [178]: np.argmin(score.cumsum(1) < threshold, 1)
Out[178]: array([5, 3, 5], dtype=int64)

In [179]: np.argmax(score.cumsum(1) < threshold, 1)
Out[179]: array([0, 0, 0], dtype=int64)

It will be like this.

Is it justice to ask for something cool like np.where? Please let me know if anyone has any cool tips.

Recommended Posts

Find the smallest index that meets the cumulative sum threshold with numpy
Find a position above the threshold with NumPy
Find the sum of unique values with pandas crosstab
Solved the world's smallest hint Sudoku with numpy & numba
[Python] Find the second smallest value.
Find the Levenshtein Distance with python
Find the inertial spindle and moment of inertia from the inertial tensor with NumPy