There are two ways to average with NumPy: np.mean and np.average (although there may be others). Here, let's leave the difference between the two to other articles, and see which is better in terms of processing speed.
The following code is based on this site.
CentOS Linux release 7.6.1810 (Core) Python 3.8.5
First, simply average a one-dimensional array.
test1.py
import timeit
import numpy as np
def func_average(n):
return np.average(np.arange(n**2))
def func_mean(n):
return np.mean(np.arange(n**2))
n = 1000
loop = 1000
result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)
print('average: ', result_average / loop)
print('mean: ', result_mean / loop)
Execution result
average: 0.0015093836802989244
mean: 0.001964871450792998
In addition to this, I measured 4 times, but in each case, np.average was about 20 to 30% faster.
Next, take the average of the two-dimensional array.
test.py
import timeit
import numpy as np
def func_average(n):
return np.average(np.arange(n**2).reshape(n, n))
def func_mean(n):
return np.mean(np.arange(n**2).reshape(n, n))
n = 1000
loop = 1000
result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)
print('average: ', result_average / loop)
print('mean: ', result_mean / loop)
Execution result
average: 0.0016613604403100909
mean: 0.002270303561817855
Even in this case, np.average was fast.
I tried other similar conditions, but the result was that np.average was faster. This doesn't mean np.mean ...
I thought that was not the case, and when I searched further, I found such a thread. Translated, modified and quoted below:
(Omitted) There are other differences to consider between
averageandmean(other differences were mentioned in the previous thread).averagetakes the average of all the values in the array without considering the mask, whilemeantakes the mask, so it takes the average only with the unmasked values.
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
Indeed, it seems that mean should be used when dealing with arrays containing missing values.
average.numpy.ma to handle data containing missing values, use mean.It is best to use it properly according to the purpose, but considering the time taken for the entire program, I think that the ratio of average and mean to it is insignificant, so the option to use mean without thinking about anything Isn't it an ant?