[PYTHON] Differences between numpy and pandas methods for finding variance

TL;DR

I'm doing distributed processing of numpy and pandas, and they don't match, so why? I will leave a note because it became.

The result of the method for finding var in numpy and pandas does not match the default value

Test with a simple, randomly generated matrix. It doesn't really match.

import numpy as np
import pandas as pd

X = np.random.randn(10, 10)
df = pd.DataFrame(data=X)

np.allclose(X, df.values)
# True

X_var = np.var(X, axis=1)
df_var = df.var(axis=1)

np.allclose(X_var, df_var.values)
# False

When I actually check the documentation, the default is ddof = 0 in numpy.var. , Pandas.DataFrame.var defaults to ddof = 1 ..

If you align the default values, the results will match.

X_var_ddof1 = np.var(X, ddof=1, axis=1)
df_var_ddof1 = df.var(axis=1)

np.allclose(X_var_ddof1, df_var_ddof1.values)
# True

I thought that the calculation results wouldn't match, but in fact there was a slight difference between numpy and pandas. I'd like you to unify it, but I'll publish a memo in case someone was addicted to it.

Recommended Posts

Differences between numpy and pandas methods for finding variance
Differences between Numpy 1D array [x] and 2D array [x, 1]
Correspondence between pandas and SQL
To go back and forth between standard python, numpy, pandas ①
Performance comparison between 2D matrix calculation and for with numpy
Differences between Windows and Linux directories
Differences between yum commands and APT commands
Difference between Numpy randint and Random randint
Differences between Python, stftime and strptime
Differences in authenticity between Python and JavaScript
Differences in syntax between Python and Java
Matplotlib Basics / Differences between fig and axes
Differences in multithreading between Python and Jython
Differences between Django's request.POST ['hoge'] and request.POST.get ('hoge')
Differences between Ruby and Python (basic syntax)
Differences between queryStringParameters and multiValueQueryStringParameters in AWS Lambda
Summary of the differences between PHP and Python
Adjust font differences between Qt for Python OS
Differences and commonalities between dict, list, and tuple types
Differences between glibc, musl libc and go resolvers