I had the opportunity to work on improving the calculation efficiency of numpy. In that, I learned the importance of dtype.

Of course, I hope it will be useful for someone as a personal memorandum. What I wrote is this article.

[Addition]
numpy is a numpy.array created by applying the `.values`

method to the dataframed version of pandas read_csv.

When I write it,

- Perform multiple matrix calculations of size (10,000,400) @ (4,550,000)
- Take the intersection of them

I had to do the process, but just by changing the dtype of numpy I was able to improve it as follows. (How terrible it was at first ...)

item | before | after |
---|---|---|

processing time | 70 minutes | 10 minutes |

Memory used | Over 100GB | Over 30GB |

- Memory is from the value of "Memory" in Mac Activity Monitor. (It was a level where the jupyter kernel fell, but now I can survive with a margin)

Below, I will write what kind of changes I made.

** Set dtype of numpy.array to `float32 or float64`

**

It seems that BLAS is used for matrix calculation of numpy. So, it seems that this guy will do a good job if it is the above data type. (See the article at the top of the reference link)

In my case, it was int at first, but with `.astype (np.float32)`

The processing time was reduced from 70 minutes to 10 minutes just by changing the type to float32. !!

[Addition] After sleeping overnight and rereading, I thought that the explanation of the situation was insufficient, so I will supplement it a little.

Originally one hot encoded with `pd.get_dummies`

and then with` .values`

I took out numpy.ndarray and calculated it.

With this method, the data type will be uint8, but this will be changed to float32. I was able to achieve high speed by changing it.

The code looks like this.

```
pd.get_dummies([Pandas Series]).astype(np.float32).values
```

** Set the matrix value to a bool value` if possible **

The above-mentioned floatization made the process explosive, but the memory was consumed.
I was resisting by often `del`

ing unnecessary objects, but it did not lead to a big improvement.

However, by re-holding the calculation result in bool, I was able to save a lot of memory. (It seems that the memory reserved in advance is different between bool type and int, float type)

[Supplement]
In python, the following relationship holds between `1/0 and True / False`

.

So, a matrix that can be represented by 1/0 (for example, a one hot encoding matrix) is It can also be expressed as a bool value.

As mentioned above, we were able to greatly improve efficiency just by combining the two points. In the end, the process flow was like this.

- Perform matrix calculation with float32 type ・ ʻArrayA = float32 matrix @ float32 matrix`
- Convert the obtained result to bool type ・ Just do ʻarrayA = (arrayA> = 1)`
- Take the intersection of them
・
`result = arrayA | arrayB | arrayC`

Like

The float seems to be usable in various cases, but is the bool limited? I don't think it can be applied unless the result can be represented by 0/1.

--I also saw an article saying that if you write by specifying the numpy type in Cpython, it will be super fast. However, I didn't actually try it this time. If I have the opportunity, I would like to give it a try. -Intel MKL seems to have better performance, but the execution environment is Intel CPU Not always, so I ran it with OpenBLAS.

- https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication
- https://www.benjaminjohnston.com.au/matmul
- https://stackoverflow.com/questions/18743397/python-numpy-np-int32-slower-than-np-float64
- https://insilico-notebook.com/python-blas-performance/

Recommended Posts