[PYTHON] Think of me as a 5 year old and tell me about Scikit-learn's Permutation_Importance.

I tried using Scikit-learn's Permutation Importance

Until now, the library for PermutationImportance was the library called ʻELI5[ELI5 Official Document](https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html). (ELI5 stands for Explain Like I'm 5. (explain me as a 5-year-old).) Recently,Permutation Importance has been implemented from Scikit-Learn 0.22. Until now, I wasn't sure what the features contributed after calculating with the support vector, but now I can see what the features are important with Permutation Importance`.

To put it simply, PermutationImportance selects one of the features and shuffles the values in it to make them meaningless. The accuracy is calculated using the data, the accuracy is compared with the data set of the correct features, and how much the selected features affect the accuracy is calculated.

It was pretty easy to calculate. Import permutation_importance from sklearn.inspection. All I had to do was load and calculate the instance ʻoptimized_regr created by optimizing the parameters with ʻoputuna in the support vector and the dataset as an argument to permutation_importance.

#From here sklearn permutation_importance
from sklearn.inspection import permutation_importance
result = permutation_importance(optimised_regr, X_test_std, y_test, n_repeats=10, n_jobs=-1, random_state=0)

#Put the result in a Pandas dataframe and display it
df = pd.DataFrame([boston.feature_names,result.importances_mean,result.importances_std],index=['Featue','mean','std']).T
df_s = df.sort_values('mean',ascending=False)
print(df_s)

I loaded the result into pandas and made a table.

Featue mean std
5 RM 0.466147 0.066557
12 LSTAT 0.259455 0.0525053
8 RAD 0.141846 0.0203266
9 TAX 0.113393 0.0176602
7 DIS 0.0738827 0.0178893
10 PTRATIO 0.0643727 0.0205021
6 AGE 0.0587429 0.010226
4 NOX 0.0521941 0.0235265
2 INDUS 0.0425453 0.0185133
0 CRIM 0.0258689 0.00711088
11 B 0.017638 0.00689625
3 CHAS 0.0140639 0.00568843
1 ZN 0.00434593 0.00582095

Until now, it was not possible to know which features were affected by the calculation using the support vector, but now that permutation_importance has been implemented, it is possible to understand which features are affected. I did.

Recommended Posts

Think of me as a 5 year old and tell me about Scikit-learn's Permutation_Importance.
Think about the next generation of Rack and WSGI
Think about the analysis environment (Part 1: Overview) * As of January 2017
As a result of mounting and tuning with POH! Lite
A little more about references ~ Using Python and Java as examples ~