This article is based on @ shiracamus's comment in Previously written article.
numpy's all is a function that returns True if all the elements of the numpy array are True, False otherwise. The documentation is here (https://docs.scipy.org/doc/numpy/reference) /generated/numpy.ndarray.all.html#numpy.ndarray.all). Here we are referring to `ndarray.all ()`
. As the documentation says, `np. All () ``` has the same meaning, and there seems to be a function called` `np.alltrue ()` `, but [github source code](https://github.com/numpy/numpy) Looking at /blob/master/numpy/core/fromnumeric.py), these two functions seemed to call
ndarray.all ()
`` after all, so I won't use them this time.
Calculation using numpy is very fast, so basically it is faster to calculate with numpy than to write it directly with python. However, I would like to introduce it because I could overturn it under limited conditions. I think.
Also, numba is a library that performs Just In Time (JIT) compilation to achieve speedup. Therefore, it may take some time to compile when accessing the function for the first time. Specifically, a hint (
@ numba.jit (numba.b1 (numba.b1 [:])) `) takes almost no compilation time, but not (` `@ numba.jit
(When `) may take a few seconds. There is no big difference in terms of post-compile execution time.
Compare the following three when numba was not used for the first time.
If you don't use numba, just comment out the `` `@ numba.jit``` part.
test.py
import numpy as np
import numba
import time
import matplotlib.pyplot as plt
#Use built-in functions
@numba.jit(numba.b1(numba.b1[:]))
def builtin(arr):
return all(arr)
#Describe with a for statement
@numba.jit(numba.b1(numba.b1[:]))
def use_for(arr):
for element in arr:
if not element:
return False
else:
return True
#use all of numpy
@numba.jit(numba.b1(numba.b1[:]))
def np_all(arr):
return arr.all()
#Test the function as an argument
def test(func):
elapsed_time = []
for arr_size in range(1000):
arr = np.ones(arr_size).astype(np.bool)
start = time.time()
func(arr)
end = time.time()
elapsed_time.append((end - start) * 1e6)
return elapsed_time[1:]
if __name__ == '__main__':
plt.plot(test(use_for), 'g', label='for')
plt.plot(test(builtin), 'r', label='built-in')
plt.plot(test(np_all), 'b', label='numpy')
plt.legend()
plt.xlabel('array size')
plt.ylabel('elapsed time[us]')
plt.show()
python test.py
The execution time when the size of the array is 1000 or less is shown in the figure. From this, you can see that numpy is more useful as the size of the array increases. Also, the built-in function is better than the for statement. I think that being fast is a property of Python. If it is 200 or less, I found that the other two methods are faster than numpy. This number of 200 may depend on the environment. there is.
Plot is done except for the elapsed time required for the very first access. Then I got the result as shown in the figure. It seems that the built-in function cannot be accelerated by numba, but the all and for statements of numpy are accelerated. Also, it is interesting that * the built-in functions are slower than when numba was not used *. On the other hand, when written in a for statement, it is faster, and its speed is faster than numpy.
If the array is larger, it looks like the following figure. Excluding the built-in functions. The for statement still seems to be faster.
I realized that I didn't pay attention to the CPU usage after writing so far, and again
taskset -c 0 python test.py
I got a similar graph when I ran it as, so there was no problem.
Thanks to @shiracamus.
Recommended Posts