[PYTHON] About all of numpy

About all of numpy

Please refer to the new article created in ** here ** with reference to @ shiracamus's comment.

Introduction

All in numpy is a function that returns True if all the elements in the numpy array are True, False otherwise. The documentation is here (https://docs.scipy.org/doc/numpy/reference) /generated/numpy.ndarray.all.html#numpy.ndarray.all).

Calculation using numpy is very fast, so basically it is faster to calculate with numpy than to write it directly with python, but I really wanted to speed up that part and tried various things, limited conditions If so, I was able to overturn it, so I would like to introduce it.

Method

The method is to access all the array elements with the for statement and calculate in order with and. Compare this with all of numpy. I would also like to find out the time when using numba.

Source code

import numpy as np
import time
import matplotlib.pyplot as plt
import sys

#use all
def func1(arr):
    return arr.all()

#Use and with for
def func2(arr):
    tf = True
    for i in range(arr.size):
        tf = tf and arr[i]
    else:
        return tf

if __name__ == '__main__':
    if len(sys.argv) == 3:
        testsize, arr_size = map(int, sys.argv[1:])
    else:
        testsize = 10
        arr_size = 10
    #Number of tests,Array size
    print(testsize, arr_size)

    elapsed_time = []
    for i in range(testsize):
        #Array of True and False
        arr = np.random.randint(2, size=arr_size).astype(np.bool)
        start = time.time()

        func1(arr)

        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    plt.plot(elapsed_time[1:], 'b', label='numpy all')

    elapsed_time = []
    for i in range(testsize):
        arr = np.random.randint(2, size=arr_size).astype(np.bool)
        start = time.time()

        func2(arr)

        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    plt.plot(elapsed_time[1:], 'r', label='for')
    plt.xlabel('test size')
    plt.ylabel('elapsed time[us]')
    plt.legend()
    plt.show()

result

numba not used

Assuming that the size of the array is 10 and the number of tests is 10 times, the result is as shown in the following figure. It is faster to do and using the for statement. fig10_10.png

If you set the size of the array to 200 and the number of tests to 10 times, the result will be as shown in the following figure. All is faster. fig10_200.png

The for statement becomes slower as the size of the array increases. You can see how it looks in the following figure. It is unknown what this pulse-like appearance is. From here, I think that it depends on the environment, but if the array size is 100 or less, it is written as it is in python Turned out to be faster. fig_.png

use numba

Since numba compiles Just In Time (JIT), it takes a long time to access the function at the very beginning, so it plots except for the elapsed time required for the first access. I got the result. It seems that the second access also takes time. It can be said that there is no difference in execution time. fig.png If the array is larger, it will look like the following figure. numpy is faster. fig__.png

Conclusion

I have pasted a lot of graphs, but I would like to say the following two things.

Finally

What was the pulse that appeared in that graph?

Recommended Posts

About all of numpy
About all of numpy (2nd)
About numpy
About Numpy broadcast
About assignment of numpy.ndarray
About MultiIndex of pandas
About variable of chainer
About __all__ in python
About cumulative assignment of lists and numpy arrays
Set function of NumPy
About import error of numpy and scipy in anaconda
Sum of multiple numpy arrays (sum)
About Japanese path of pyminizip
About the ease of Python
About Japanese support of cometchat
About various encodings of Python 3
About cost calculation of MeCab
About approximate fractions of pi
About the components of Luigi
About HOG output of Scikit-Image
About the features of Python
About data management of anvil-app-server
About Numpy array and asarray
Visualization of matrix created by numpy
Rewrite piecewise of NumPy for CuPy
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
[Memo] Small story of pandas, numpy
About the upper limit of threads-max
About circular crossover of genetic algorithms
About the behavior of yield_per of SqlAlchemy
About import error of PyQt5.QtWidgets (Anaconda)
About the size of matplotlib points
About color halftone processing of images
About the basics list of Python basics
About building GUI using TKinter of Python
About the behavior of enable_backprop of Chainer v2
About the virtual environment of python version 3.7
About sensor_mode and angle of view of picamera
Memorandum of python beginners About inclusion notation
A memorandum of understanding about django's QueryDict
Summary of numpy functions I didn't know
About the arguments of the setup function of PyCaret
Get all live tweets of professional baseball
About Japanese fonts of matplotlib (for Mac)
About the Normal Equation of Linear Regression
Read all the contents of proc / [pid]
Memo of troubles about coexistence of Python 2/3 system
[Python] Chapter 02-04 Basics of Python Program (About Comments)
Weighting of random.choice even under numpy v1.6