[PYTHON] Scatter plot

Quantitative variables and quantitative variables

Scatter plot between two items

matplotlib

import matplotlib.pyplot as plt

#Scatter plot creation from 2 items of columns
plt.scatter(df['column1'], df['column2'])
#Added labels for horizontal and vertical axes
plt.xlabel('column1')
plt.ylabel('column2')
plt.show()

#Correlation coefficient between column1 and column2
print(df[['column1', 'column2']].corr())

download-1.png

seaborn

import seaborn as sns

sns.scatterplot(data=df, x="column1", y="column2")

#Correlation coefficient between column1 and column2
print(df[['column1', 'column2']].corr())

download-2.png

Scatter plot of quantitative variables + qualitative variables

sns.scatterplot(data=df, x="column1", y="column2", hue="y")
# hue="y"Is distinguished by color/style="y"Distinguished by shape
#If there is a lot of overlap alpha= 0~Concentration adjustment with 1

#Correlation coefficient between column1 and column2
print(df[['column1', 'column2']].corr())

download.png

Scatter plot of quantitative variables + quantitative variables (bubble chart)

import seaborn as sns

sns.scatterplot(data=df, x="column1", y="column2", hue="y", size="column3", sizes=(10,200))

#Specify a quantitative variable for the argument size of the scatterplot function
#In sizes, specify the size range of plot
# alpha=0~Concentration adjustment with 1

#Correlation coefficient between age and balance
print(df[['column1', 'column2']].corr())

download.png

Going out of legend items

ax = sns.scatterplot(data=df, x="column1", y="column2", hue="y", size="column3", sizes=(10,200))
ax.legend(loc="upper left", bbox_to_anchor=(1,1))

print(df[['column1', 'column2']].corr())

download-1.png

plotly

# pip install plotly

import plotly.express as px

fig=px.scatter(df,x="column1", y="column2", size ="column3", color="y",size_max=30)
fig.show()

Unknown-2.png

Joint plot

import seaborn as sns

sns.jointplot(data=df, x="column1", y="column2",marginal_kws={"bins":10})
#Marginal that specifies the number of histograms_kws={"bins":Number}
#Color specification: color
# kind="hex"From plot to hexagonal bin display. The density represents the density of the plot.
#It seems that hue cannot be used as an argument


#Correlation coefficient between column1 and column2
print(df[['column1', 'column2']].corr())

download-3.png

Scatterplot matrix

Display the relationship between two items at once

matplotlib

import matplotlib.pyplot as plt

#Drawing a scatterplot matrix
pd.plotting.scatter_matrix(df[['column1','column2','column3','column4']])
plt.tight_layout()
plt.show()

download.png

Scatterplot matrix → correlation coefficient matrix → heatmap

import seaborn as sns
sns.pairplot(data=df[['column1','column2','column3','column4',"y"]],hue="y", diag_kind = "hist")
plt.show()

#Again, you can see the distribution of qualitative variables with hue as an argument.
#When specifying color coding for each qualitative variable: palette={'yes': 'red','no':'blue'}
#When specifying markers to plot markers='+' / markers=['+', 's', 'd']

#Diagonal Plot Histogram: diag_kind = "hist" /Kernel density estimation diag_kind = "kde"

#Plot density adjustment: alpha=0~1

#Draw a regression line on the scatter plot between the two items: kind='reg'

#Specify output graph size: height=2

#Specify the columns to graph: x_vars=['column1', 'column2'],y_vars=['column1', 'column2']

#The object type data you want to specify for hue is required in the specified df
# sns.pairplot(df[['column1','balance','day','duration']],hue="y")Error

# type(df[['column1','column2','column3','column4',"y"]]) # pandas.core.frame.DataFrame

#Output pair plot.Save as png
# sns.pairplot(df[['column1','column2','column3','column4',"y"]],hue="y").savefig('file.png')

#Correlation coefficient matrix
corr = df[['column1','column2','column3','column4',"y"]].corr(method="pearson")
print(corr)

#Make a heatmap from the correlation coefficient matrix
sns.heatmap(corr, cmap='coolwarm', annot=True)
plt.show()
#On / off the description of the correlation coefficient with annnot

download.png

download-1.png

Quantitative and qualitative variables

import seaborn as sns

sns.catplot(data=df,x="category1", y="column1",hue="y",alpha=0.5)

download-2.png

Recommended Posts

Scatter plot
Continuously color with matplotlib scatter plot
Display histogram / scatter plot on Jupyter Notebook
[Python] Violin Plot
Create 3D scatter plot with SciPy + matplotlib (Python)
(Memorandum) Make a 3D scatter plot with matplodlib
Let's look at the scatter plot before data analysis
[Python] How to draw a scatter plot with Matplotlib
Seaborn basics for beginners ③ Scatter plot (jointplot) * With histogram