The VIF calculated by Python and the VIF calculated by Excel are different .. ??

You can check VIF with Python and it's super convenient!

You can check the VIF (Variance Inflation Factor) in Python, and you can check the multicollinearity between the explanatory variables while looking at this result. Generally, when VIF> 10, it can be judged that multicollinearity is strong.

from statsmodels.stats.outliers_influence import variance_inflation_factor

df_all = pd.read_excel('train.xlsx',sheet_name="Sheet1")

cols = df_all.select_dtypes(include=[np.number]).columns
cols_x = cols[1:]
data_x = df_all[cols_x]
#Calculate vif
vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(data_x.values, i) for i in range(data_x.shape[1])]
#vif["features"] = data_x.columns 
 
#output the calculation result of vif
print(vif)
 
#Graph vif
plt.plot(vif["VIF Factor"])

The result will come out like this. It's convenient! image.png

However, when compared with the VIF calculated by Excel ...

It was discovered that VIF came out with different results ('Д') .. !! image.png

In the first place, VIF is calculated by the following formula.

VIF = 1/(1-R2) #R2: coefficient of determination

When one of the explanatory variables is regarded as the objective variable, the coefficient of determination R2 obtained when performing multiple regression analysis with the remaining explanatory variables is used. Speaking sensuously, I understand that if you can express one variable, which is the remaining explanatory variable, well, you don't need that variable? The fact that the VIF is different means that this R2 is different between Python and Excel, so I panicked for a moment.

The cause of the difference was whether or not the intercept was included ..

It turned out that the reason was different, whether or not to include the intercept in the explanatory variable.

On the Python side, process as intercept = 0 When I examined it in Excel, I didn't specify the intercept.

I was able to confirm that the VIFs match when I set the intercept to 0 in Excel.

image.png ↑ Whether to check here

I want to ask everyone .. Which is correct after all?

--Isn't there any problem if you use Python's stats model? --Should the intercept be specified? --Anyway, VIF should be evaluated with the combination that maximizes R2, and it doesn't matter if the intercept is 0 or not? ――VIF is just a guide, so don't you have to worry about it?

I'm thinking about the above, but how about everyone? I'm also wondering what the VIF calculation algorithm of the stats model is in the first place ...

If you have any opinions or advice, please do not hesitate to contact us! !!

Recommended Posts

The VIF calculated by Python and the VIF calculated by Excel are different .. ??
Python a + = b and a = a + b are different
Python open and io.open are the same
The answer of "1/2" is different between python2 and 3
Manipulate the clipboard in Python and paste the table into Excel
[Python3] "A // B" and "math.floor (A / B)" are not always the same! ??
The story of Python and the story of NaN
Are macOS and Linux completely different?
numpy's matrix and mat are different
[Python] What are @classmethods and decorators?
Read an Excel sheet and loop it line by line Python VBA
Verification of the theory that "Python and Swift are quite similar"
Try to implement and understand the segment tree step by step (python)
[Python Kivy] How to get the file path by dragging and dropping
I'm stunned by the behavior of filter () due to different versions of Python
If you are told cannot by Python import, review the file name
I tried to verify and analyze the acceleration of Python by Cython
Open an Excel file in Python and color the map of Japan
Modules and packages in Python are "namespaces"
All Python arguments are passed by reference
Socket communication and multi-thread processing by Python
yum and apt update / upgrade are different
Read the file line by line in Python
Read the file line by line in Python
Pandas of the beginner, by the beginner, for the beginner [Python]
Socket communication by C language and Python
Academia Potter and the Mysterious Python Pass
Divides the character string by the specified number of characters. In Ruby and Python.
Fourier transform the wav file read by Python, reverse transform it, and write it again.
Get coordinate values and keyboard input values by clicking on the python / matplotlib diagram
Get the last element of the array by splitting the string in Python and PHP
Check if the characters are similar in Python
[Python] What are the two underscore (underscore) functions before?
The first web app created by Python beginners
Summary of the differences between PHP and Python
The contents of the Python tutorial (Chapter 5) are itemized.
The contents of the Python tutorial (Chapter 4) are itemized.
The contents of the Python tutorial (Chapter 2) are itemized.
How python classes and magic methods are working.
ffmpeg-Build a python environment and split the video
The contents of the Python tutorial (Chapter 8) are itemized.
The contents of the Python tutorial (Chapter 1) are itemized.
Specifying the range of ruby and python arrays
Automatically create word and excel reports in python
I compared "python dictionary type" and "excel function"
About the difference between "==" and "is" in python
The contents of the Python tutorial (Chapter 10) are itemized.
Compare the speed of Python append and map
Excel X Python The fastest way to work
How to erase the characters output by Python
What are you comparing with Python is and ==?
Solving the Lorenz 96 model with Julia and Python
Notify error and execution completion by LINE [Python]
Archive and compress the entire directory with python
The contents of the Python tutorial (Chapter 6) are itemized.
The contents of the Python tutorial (Chapter 3) are itemized.
About the * (asterisk) argument of python (and itertools.starmap)
A discussion of the strengths and weaknesses of Python
What are python tuples and * args after all?
Split Python images and arrange them side by side
Python> Sort by number and sort by alphabet> Use sorted ()