[PYTHON] About scatter_matrix (scatter matrix)

Hello~. 2019/12/23 It's before Christmas. Today I would like to touch on the Scatter matrix.

scatter matrix ・ ・ ・ A diagram used to roughly see what kind of correlation each data has.

API

pandas.plotting.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwds) Official documentation

Parameter description

Sample code

Quoted from Machine learning starting with Python.

sample.py


import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import mglearn

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris['data'], iris['target'], random_state=0
)

fig, ax = plt.subplots()
iris_dataframe = pd.DataFrame(X_train, columns=iris.feature_names)

grr = pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15),
                        ax=ax, marker='o', hist_kwds ={'bins': 20}, s = 60,
                        alpha=0.8, cmap=mglearn.cm3)
plt.show()

scatter_matrix.png

beautiful. To briefly explain, it is basically a matrix of graphs, and the combination of x and y depends on the elements of the matrix. Individual histograms (bin: 20) are drawn diagonally.

scatter_matcix is a function for plotting the combinations of features that can be combined for all variables. However, note that this method may not allow you to see interesting aspects of the data because you cannot see all the features at the same time.

For the time being, it may be the best way to inspect the data.

References

scatter_matrix official page

Recommended Posts

About scatter_matrix (scatter matrix)
About Confusion Matrix
About the confusion matrix