Hello~. 2019/12/23 It's before Christmas. Today I would like to touch on the Scatter matrix.
scatter matrix ・ ・ ・ A diagram used to roughly see what kind of correlation each data has.
API
pandas.plotting.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwds) Official documentation
Quoted from Machine learning starting with Python.
sample.py
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import mglearn
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris['data'], iris['target'], random_state=0
)
fig, ax = plt.subplots()
iris_dataframe = pd.DataFrame(X_train, columns=iris.feature_names)
grr = pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15),
ax=ax, marker='o', hist_kwds ={'bins': 20}, s = 60,
alpha=0.8, cmap=mglearn.cm3)
plt.show()
beautiful. To briefly explain, it is basically a matrix of graphs, and the combination of x and y depends on the elements of the matrix. Individual histograms (bin: 20) are drawn diagonally.
scatter_matcix is a function for plotting the combinations of features that can be combined for all variables. However, note that this method may not allow you to see interesting aspects of the data because you cannot see all the features at the same time.
For the time being, it may be the best way to inspect the data.