[PYTHON] Sequential calculation of mean value with online algorithm

An online algorithm is an algorithm that can be calculated even when data comes in sequentially. This is called in contrast to the batch (offline) algorithm, which calculates after seeing all the data.

Online algorithms are often used when dealing with large amounts of data where it is difficult to keep all the data in memory. You may often see it in the context of machine learning. In the context of machine learning, there are many things that are theoretically difficult, but since averaging, variance, sampling, etc. are used relatively often when creating a simple system, processing can be written even if data comes sequentially. You may be happy if you become.

This time, I will write the method only for the sequential calculation of the mean value.

Sequential calculation of mean value

ʻAverage is the average value you want to find, total` is the number of data processed so far. The goal is to be able to describe in the following form.

ruby/python


average += x
total += y

Of course, in order to be valid as an online algorithm, the form ʻaverage * = z` may be targeted, but in the case of the (arithmetic) average value, the solution can be obtained if the above form is used.

First, y = 1 because total represents the number of times.

Next, since ʻaverage` is not trivial, it is calculated by expanding the formula as follows. Here, value is a value of data for which a new average is to be obtained.

math


average[new] = (average[old] * total[old] + value) / (total[old] + 1)
average[new] = average[old] + x

Than

average[old] + x = (average[old] * total[old] + value) / (total[old] + 1)

If you solve this

average[old] + average[old] * total[old] + x * (total[old]+1) = average[old] * total[old] + value
average[old] + x * (total[old]+1) = value

as a result

x = (value - average[old]) / (total[old] + 1)

Get

As a result of writing in the program,

ruby/python


average += (value - average) / (total + 1)
total += 1

Will be. here,

math


total[new] = total[old] + 1

Considering that, programmatically

ruby/python


total += 1
average += (value - average) / total

May be cooler.

Checked with a concrete example

There are 3 data, and the value is

20 -> 30 -> 40 

Suppose that it was.

At this time, the final average is

30

And the average at that time is

20 -> 25 -> 30

It suffices if it changes.

I will actually try it.

1st time

math


x = (20 - ?) / 1 = 20 - ?
average = ? + 20 - ? = 20

Second time

math


x = (30 - 20) / 2 = 5
average = 20 + 5 = 25

3rd time

math


x = (40 - 25) / 3 = 5
average = 25 + 5 = 30

And you can see that it is working.

By the way, the first ? Is the initial value of average, and any value may be entered. Actually, it is 0, or if the average value can be estimated in advance, it is better to enter the estimated value.

bonus

Tips for making variance an online algorithm

math


V(X) = E(X*X) - E(X)*E(X)

use. Here, V represents the variance and ʻE` represents the mean. In other words, it can be calculated by holding the average value of the squared values of the data.

Tips for making sampling an online algorithm

For example, when you want to sample only one value from a large amount of data, Let that data be sampled, and let the number of data seen so far be total.

At this time,

--The probability that the new data d will be sampled is 1 / (total + 1) --The probability of using the previous sampled as it is istotal / (total + 1)

Then, the sampled result can be obtained with equal probability.

Of course, if you apply this, you can sample only three.

Recommended Posts

Sequential calculation of mean value with online algorithm
Real-time calculation of mean values with coroutines
Calculation of mutual information (continuous value) with numpy
Error-free calculation with big.Float of golang
Play with numerical calculation of magnetohydrodynamics
Implementation of Dijkstra's algorithm with python
[Algorithm x Python] Calculation of basic statistics Part2 (mean, median, mode)
[Algorithm x Python] Calculation of basic statistics (total value, maximum value, minimum value)
Calculation of mean IoU in object detection
Algorithm learned with Python 8th: Evaluation of algorithm
Numerical calculation of differential equations with TensorFlow 2.0
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
Algorithm learned with Python 13th: Tower of Hanoi
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
Visualize the behavior of the sorting algorithm with matplotlib
[Basics of modern mathematical statistics with python] Chapter 2: Probability distribution and expected value
Sequential calculation of mean value with online algorithm
Calculation of mutual information (continuous value) with numpy
Performs high-speed calculation of only specific descriptors with mordred
Maximum likelihood estimation of mean and variance with TensorFlow
Take the value of SwitchBot thermo-hygrometer with Raspberry Pi
Log the value of SwitchBot thermo-hygrometer with Raspberry Pi