[PYTHON] np.mean and np.average speed showdown

There are two ways to average with NumPy: np.mean and np.average (although there may be others). Here, let's leave the difference between the two to other articles, and see which is better in terms of processing speed. The following code is based on this site.

Execution environment

CentOS Linux release 7.6.1810 (Core) Python 3.8.5

Execution code 1

First, simply average a one-dimensional array.

test1.py


import timeit
import numpy as np

def func_average(n):
    return np.average(np.arange(n**2))

def func_mean(n):
    return np.mean(np.arange(n**2))

n = 1000
loop = 1000

result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)

print('average: ', result_average / loop)
print('mean:    ', result_mean / loop)

Execution result

average:  0.0015093836802989244
mean:     0.001964871450792998

In addition to this, I measured 4 times, but in each case, np.average was about 20 to 30% faster.

Execution code 2

Next, take the average of the two-dimensional array.

test.py


import timeit
import numpy as np

def func_average(n):
    return np.average(np.arange(n**2).reshape(n, n))

def func_mean(n):
    return np.mean(np.arange(n**2).reshape(n, n))

n = 1000
loop = 1000

result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)

print('average: ', result_average / loop)
print('mean:    ', result_mean / loop)

Execution result

average:  0.0016613604403100909
mean:     0.002270303561817855

Even in this case, np.average was fast.

Consideration

I tried other similar conditions, but the result was that np.average was faster. This doesn't mean np.mean ... I thought that was not the case, and when I searched further, I found such a thread. Translated, modified and quoted below:

(Omitted) There are other differences to consider between average and mean (other differences were mentioned in the previous thread). average takes the average of all the values in the array without considering the mask, while mean takes the mask, so it takes the average only with the unmasked values.

g = [1,2,3,55,66,77] 
f = np.ma.masked_greater(g,5) 
np.average(f) 
Out: 34.0 
np.mean(f) 
Out: 2.0

Indeed, it seems that mean should be used when dealing with arrays containing missing values.

Conclusion

It is best to use it properly according to the purpose, but considering the time taken for the entire program, I think that the ratio of average and mean to it is insignificant, so the option to use mean without thinking about anything Isn't it an ant?

Recommended Posts

np.mean and np.average speed showdown
Speed comparison between CPython and PyPy