Posted on April 13, 2020
--People who have touched python and have a good execution environment --People who have touched pyTorch to some extent --People using torch.var () by pyTorch
Nowadays, research on machine learning is mainly done in the python language. Because python has a lot of libraries (called modules) for fast data analysis and calculations. Among them, this time we will use a module called ** pyTorch ** and talk about ** torch.var () ** in it. From the conclusion, the calculation by ** torch.var () ** is not the variance but the ** unbiased variance (sample variance) **. In fact, in many statistical libraries, ** variance ** seems to refer to ** unbiased variance ** (I didn't know, but statisticians take it for granted). I will actually introduce this story through the program.
However, this article is like your own memo, and I want you to use it as a reference only, and there may be cases where you use incorrect expressions or phrases for the sake of brevity, but please understand that. I want you to do it.
I would like you to be able to use python's numpy and pyTorch to some extent as prior knowledge. In this article, we will proceed with the body that can be used as a matter of course. For reference, refer to the article about Tensor type of pyTorch in the following Link.
What is the Tensor type of pyTorch
First, before writing the program, the formulas for the mean $ \ mu $, variance $ \ sigma ^ 2 $, and unbiased variance $ s ^ 2 $ are shown.
\mu = \frac{1}{n}\sum_i^n x_i\\
\sigma^2 = \frac{1}{n}\sum_i^n (x_i-\mu)^2\\
s^2 = \frac{1}{n-1}\sum_i^n (x_i-\mu)^2
Where $ x $ is the input sample and $ n $ is the number of samples.
The sample data is defined as follows.
filename.rb
a = torch.tensor([1.,2.,3.,4.,5.])
print(a)
------'''Output result below'''--------
tensor([1., 2., 3., 4., 5.])
Well, first of all, if you try to find the variance normally
filename.rb
mu = torch.mean(a)
var = torch.mean((a - mu)**2)
print(var)
------'''Output result below'''--------
tensor(2.)
Here, ** torch.mean () ** calculates the average of all the input elements. Thus the variance was found to be 2.0.
Now, let's use pytorch's ** torch.var () **.
filename.rb
var = torch.var(a)
print(var)
------'''Output result below'''--------
tensor(2.5000)
And the value has changed. This answer is why ** torch.var () ** doesn't ask for variance. In fact, ** torch.var () ** finds the ** unbiased variance (sample variance) ** of all the input elements.
A caveat when actually using ** torch.var () **, but it's not always the case that you should avoid using it if you anticipate distribution. This is because, as you can see from the equation, when the number of samples is very large, the values are almost the same (if n is 1000, the variance divided by 1000 and the unbiased variance divided by 999 are almost the same). If you do ** with a small number of samples ** like my example this time, you need to be careful.
This time I summarized the things about torch.var (). Perhaps it's just a matter of course, but I was surprised so I wrote it as an article. Also, since I have little knowledge about the solid meaning of variance and unbiased variance, I would like you to warmly point out any mistakes in expression. I think there were many points that were difficult to read, but thank you for reading.