There are two ways to average with NumPy: np.mean
and np.average
(although there may be others). Here, let's leave the difference between the two to other articles, and see which is better in terms of processing speed.
The following code is based on this site.
CentOS Linux release 7.6.1810 (Core) Python 3.8.5
First, simply average a one-dimensional array.
test1.py
import timeit
import numpy as np
def func_average(n):
return np.average(np.arange(n**2))
def func_mean(n):
return np.mean(np.arange(n**2))
n = 1000
loop = 1000
result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)
print('average: ', result_average / loop)
print('mean: ', result_mean / loop)
Execution result
average: 0.0015093836802989244
mean: 0.001964871450792998
In addition to this, I measured 4 times, but in each case, np.average
was about 20 to 30% faster.
Next, take the average of the two-dimensional array.
test.py
import timeit
import numpy as np
def func_average(n):
return np.average(np.arange(n**2).reshape(n, n))
def func_mean(n):
return np.mean(np.arange(n**2).reshape(n, n))
n = 1000
loop = 1000
result_average = timeit.timeit(lambda: func_average(n), number=loop)
result_mean = timeit.timeit(lambda: func_mean(n), number=loop)
print('average: ', result_average / loop)
print('mean: ', result_mean / loop)
Execution result
average: 0.0016613604403100909
mean: 0.002270303561817855
Even in this case, np.average
was fast.
I tried other similar conditions, but the result was that np.average
was faster. This doesn't mean np.mean
...
I thought that was not the case, and when I searched further, I found such a thread. Translated, modified and quoted below:
(Omitted) There are other differences to consider between
average
andmean
(other differences were mentioned in the previous thread).average
takes the average of all the values in the array without considering the mask, whilemean
takes the mask, so it takes the average only with the unmasked values.
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
Indeed, it seems that mean
should be used when dealing with arrays containing missing values.
average
.numpy.ma
to handle data containing missing values, use mean
.It is best to use it properly according to the purpose, but considering the time taken for the entire program, I think that the ratio of average
and mean
to it is insignificant, so the option to use mean
without thinking about anything Isn't it an ant?