[PYTHON] Checklist on how to avoid turning the elements of numpy's array with for

If you turn the element of numpy.array with a for statement, the execution speed will drop considerably.

As a workaround to try this a little faster

--list comprehension --Use np.where if the conditions are complex --Use np.frompyfunc (but be careful when using it)

I learned that, so that memorandum.

list comprehension

This is already written everywhere,

import numpy as np
a = np.array(range(10))
a2 = []
for x in a:
    a2.append(x*2)

If you do something like that

a2 = [x*2 for x in a]

The story that you should do it. It will be much faster to experience.

Use np.where if the conditions are complex

For example, if you want to mess with array a, but you want to determine the conditions based on the elements of array b.

As an example, consider the case where the elements of array a are doubled if the elements of array b are even, and the elements of array a are tripled otherwise.

Use np.where to avoid the C ++-like index that makes you want to turn it with for.

import numpy as np
a = np.array(range(10))
print "a = ", a
b = a + 100
print "b = ", b

#It looks like this when using the index
result1 = []
for i in range(10) :
    answer = a[i]*2 if b[i]%2 == 0 else a[i]*3
    result1.append(answer)
print np.array(result1)

# np.If you use where and function, you can write in one line and it is fast

def func_double(x) :
    return x*2

def func_triple(x) :
    return x*3

result2 = np.where(b%2 == 0, func_double(a), func_triple(a))
print result2

As you commented, if it is a function of this degree, you can embed the function as it is in the list comprehension notation.

(However, I personally don't like embedding functions directly in list comprehensions ... It's hard to change things later, I forget to change them, and young students with a python generation first language list If you write a shit long code in the comprehension, it will be sharp like "It's really hard to read !!!" (laughs))

Use np.frompyfunc (but be careful when using it)

When I was looking for something a little faster, I found the following page, so I thank you for using it.

The fastest way to apply an arbitrary function to all elements of a Python list Python acceleration experiment-map function-

So, let's use frompyfunc.

import numpy as np
# prepare input arrays
a = np.array(range(10))
print "array a is", a
b = a + 100
print "array b is", b

def addition(x, y):
    return x + y

np_addition = np.frompyfunc(addition, 2, 1)

print "print a + b using frompyfunc"
print np_addition(a, b)

print "print a + 1 using frompyfunc"
print np_addition(a, 1)

print "print 1 + b using frompyfunc"
print np_addition(1, b)

def subtruction(x, y):
    return x - y

np_subtruction = np.frompyfunc(subtruction, 2, 1)

print "using np.where and frompyfuncs"
result2 = np.where(b%2 == 0, np_addition(a, b), np_subtruction(a, b))
print result2

The arguments of the universal function created by frompyfunc are the first function object, the second number of arguments, and the third number of return values. If you use this, if you put an array in the argument of the universal function, the result of applying the function to each element will be returned as an array.

What, if there was such a convenient thing, I should have used it quickly. Even in terms of experience, it feels about 30% faster than the list comprehension notation.

Because the description was incorrect, I will fold it (I will fold it so that I will not make the same mistake again in the future)

So, what I wanted to check is ** Is it possible to pass an array to the argument of the universal function created by this frompyfunc and make the rest just a float **? The story.

The result is no problem at all! ~~ In other words, it returns the calculated result while changing only the part where the array is passed for each element. ~~ ~~ Moreover, no matter which argument is used as an array, it will be decided arbitrarily. ~~

~~ Great! !! : grinning: ~~

(Correction) It wasn't such a problem, it just worked that the + and-operators of numpy used in the addition and subtruct functions accept both argument arrays and scalars (correction). Tears) So, if you put a simple list instead of np.array in the arguments a and b, the above example will die.

That's why I deleted the part I wrote before. (End of correction) </ div>

However, frompyfunc has some aspects to be careful of, so I will describe it below.

The dtype of the return value (array) of frompyfunc becomes Object type!

In the return value (numpy.array) of the universal function created by frompyfunc, dtype becomes Object type. (The type of the element inside is retained)

It's fine to just refer to the contents of the array normally, but when I try to use it with numpy.histogramadd, for example, I get angry because I can't cast from Object type to other types as shown below.

...numpy/lib/function_base.py", line 1014, in histogramdd
    flatcount = bincount(xy, weights)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

To prevent this, you can cast the return value numpy.array with astype, but ...

result2 = np.where(b%2 == 0, np_addition(a, b).astype(np.float64), np_subtruction(a, b).astype(np.float64))

The problem is that astype is a function that creates and returns a new array, so a copy of the array happens here. Depending on the conditions, this copy time may result in an execution speed that does not change even if it is turned by list comprehension.

About using the universal function created by np.where and frompyfunc together

In the last three lines of the example, it was confirmed that the universal function created by np.where and frompyfunc can be used together. But apparently, doing this is pretty slow. It was a mistake in the program that I thought it would be 10 times slower at one point, but it seems that it is about twice as slow. That's why it feels difficult to use in situations where speed is important.

After all, what is the fastest?

In the case of a simple example like the one given here, I think that the list comprehension notation can be used.

However, there are cases where you don't want the readability to drop because the inclusion notation is cluttered with functions and conditions, or when you want to do complicated work that cannot be written in one line in the first place. In such a case, I realized that writing code with the common sense of the language I studied a long time ago would take an unexpectedly long time to execute.

I tried various plans to speed up the code I had already written, but in the end, it is best to rewrite the argument of the function that is the material so that it accepts both scalar and array from the beginning and moves fast. I feel that it is good.

If you can do that, the combination of np.where and existing functions will look the cleanest and fastest, and you don't have to create a universal function with frompyfunc.

Since I was a person who first learned programs in Fortran and C ++, I wrote a function that processes data one by one, and wrote a program that is not completely out of the idea of turning it with for and processing it. If you know from the beginning that you will use numpy, numpy has many functions that handle arrays quickly, so I thought it was the correct answer to design it to process multiple records as a matrix from the beginning. think.

Recommended Posts

Checklist on how to avoid turning the elements of numpy's array with for
[Introduction to Python] How to get the index of data with a for statement
A memo on how to overcome the difficult problem of capturing FX with AI
How to extract conditions (acquire all elements of Group that satisfy the conditions) for Group by Group
I want to judge the authenticity of the elements of numpy array
How to know the number of GPUs from python ~ Notes on using multiprocessing with pytorch ~
How to change the log level of Azure SDK for Python
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
How to monitor the execution status of sqlldr with the pv command
How to use Jupyter on the front end of supercomputer ITO
How to use machine learning for work? 01_ Understand the purpose of machine learning
How to update the python version of Cloud Shell on GCP
For those of you who don't know how to set a password with Jupyter on Docker
How to run the practice code of the book "Creating a profitable AI with Python" on Google Colaboratory
How to crop the lower right part of the image with Python OpenCV
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[Image recognition] How to read the result of automatic annotation with VoTT
The second night of the loop with for
The story of trying to push SSH_AUTH_SOCK obsolete on screen with LD_PRELOAD
python: Tips for displaying an array (list) with an index (how to find out what number an element of an array is)
Useful for changing permissions on Linux! How to count up to 31 with one hand.
How to publish a blog on Amazon S3 with the static Blog engine'Pelican'for Pythonista
How the reference of the python array changes depending on the presence or absence of subscripts
Set the number of elements in a NumPy one-dimensional array to a power of 2 (0 padded)
How to get the key on Amazon S3 with Boto 3, implementation example, notes
How to access the contents of a Linux disk on a Mac (but read-only)
[python] How to sort by the Nth Mth element of a multidimensional array
[Python] How to save images on the Web at once with Beautiful Soup
[linux] How to quit without waiting for the other party to disconnect with telnet
Note: How to get the last day of the month with python (added the first day of the month)
How to get a list of files in the same directory with python
How to calculate the volatility of a brand
How to specify the launch browser for JupyterLab 3.0.0
How to use MkDocs for the first time
How to specify the NIC to scan with amazon-dash
Strategy on how to monetize with Python Java
Sort the elements of the array by specifying the conditions
How to try the friends-of-friends algorithm with pyfof
How to specify attributes with Mock of python
How to implement "named_scope" of RubyOnRails with Django
How to install OpenGM on OSX with macports
Introduction to Python with Atom (on the way)
How to avoid BrokenPipeError with PyTorch's DataLoader Note
How to get dictionary type elements of Python 2.7
How to Learn Kaldi with the JUST Corpus
How to find the correlation for categorical variables
How to create random numbers with NumPy's random module
How to read original data or external data on the Internet with scikit-learn instead of attached data set such as iris
How to set a shared folder with the host OS in CentOS7 on VirtualBOX
The first step of machine learning ~ For those who want to implement with python ~
How to identify the element with the smallest number of characters in a Python list?
How to avoid the cut-off label of the graph created by the plot module using matplotlib
I want to plot the location information of GTFS Realtime on Jupyter! (With balloon)
An easy way to pad the number with zeros depending on the number of digits [Python]
How to count the number of occurrences of each element in the list in Python with weight
How to set the development environment for each project with VSCode + Python extension + Miniconda
The 15th offline real-time I tried to solve the problem of how to write with python
How to know the port number of the xinetd service
Think about how to program Python on the iPad
Add the attribute of the object of the class with the for statement
How to put Takoyaki Oishikunaru on the segment tree