About all of numpy (2nd)

This article is based on @ shiracamus's comment in Previously written article.

Introduction

numpy's all is a function that returns True if all the elements of the numpy array are True, False otherwise. The documentation is here (https://docs.scipy.org/doc/numpy/reference) /generated/numpy.ndarray.all.html#numpy.ndarray.all). Here we are referring to `ndarray.all ()`. As the documentation says, `np. All () ``` has the same meaning, and there seems to be a function called` `np.alltrue ()` `, but [github source code](https://github.com/numpy/numpy) Looking at /blob/master/numpy/core/fromnumeric.py), these two functions seemed to call ndarray.all () `` after all, so I won't use them this time.

Calculation using numpy is very fast, so basically it is faster to calculate with numpy than to write it directly with python. However, I would like to introduce it because I could overturn it under limited conditions. I think.

Also, numba is a library that performs Just In Time (JIT) compilation to achieve speedup. Therefore, it may take some time to compile when accessing the function for the first time. Specifically, a hint ( @ numba.jit (numba.b1 (numba.b1 [:])) `) takes almost no compilation time, but not (` `@ numba.jit (When `) may take a few seconds. There is no big difference in terms of post-compile execution time.

Method

Compare the following three when numba was not used for the first time.

Access the array elements in order with the for statement
Built-in functions all (Documents here)
numpy all At first glance, 1 and 2 are the same, but you can see from the results that they are different. I would also like to find out the time when using numba as well.

Source code

If you don't use numba, just comment out the `` `@ numba.jit``` part.

`test.py`


import numpy as np
import numba
import time
import matplotlib.pyplot as plt

#Use built-in functions
@numba.jit(numba.b1(numba.b1[:]))
def builtin(arr):
    return all(arr)

#Describe with a for statement
@numba.jit(numba.b1(numba.b1[:]))
def use_for(arr):
    for element in arr:
        if not element:
            return False
    else:
        return True

#use all of numpy
@numba.jit(numba.b1(numba.b1[:]))
def np_all(arr):
    return arr.all()

#Test the function as an argument
def test(func):
    elapsed_time = []
    for arr_size in range(1000):
        arr = np.ones(arr_size).astype(np.bool)
        start = time.time()
        func(arr)
        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    return elapsed_time[1:]

if __name__ == '__main__':
    plt.plot(test(use_for), 'g', label='for')
    plt.plot(test(builtin), 'r', label='built-in')
    plt.plot(test(np_all), 'b', label='numpy')
    plt.legend()
    plt.xlabel('array size')
    plt.ylabel('elapsed time[us]')
    plt.show()

Execution method

python test.py

result

numba not used

The execution time when the size of the array is 1000 or less is shown in the figure. From this, you can see that numpy is more useful as the size of the array increases. Also, the built-in function is better than the for statement. I think that being fast is a property of Python. If it is 200 or less, I found that the other two methods are faster than numpy. This number of 200 may depend on the environment. there is.

use numba

Plot is done except for the elapsed time required for the very first access. Then I got the result as shown in the figure. It seems that the built-in function cannot be accelerated by numba, but the all and for statements of numpy are accelerated. Also, it is interesting that * the built-in functions are slower than when numba was not used *. On the other hand, when written in a for statement, it is faster, and its speed is faster than numpy.

If the array is larger, it looks like the following figure. Excluding the built-in functions. The for statement still seems to be faster.

Conclusion

numba not used

** When the size of the array is small, it is faster to use the built-in function all **
** When the size of the array is large, it is faster to use all of numpy **

use numba

Regardless of the size of the array, it is good to describe it with a ** for statement **, but considering the time and effort of description and the number of calls, numpy's all may be used.
** numba contributes greatly to speeding up **

I realized that I didn't pay attention to the CPU usage after writing so far, and again

taskset -c 0 python test.py

I got a similar graph when I ran it as, so there was no problem.

Finally

Thanks to @shiracamus.

[PYTHON] About all of numpy (2nd)

About all of numpy (2nd)

Introduction

Method

Source code

test.py

Execution method

result

numba not used

use numba

Conclusion

numba not used

use numba

Finally

`test.py`