A new version of seaborn came out on September 9, 2020. This time, I will look back on the visualization while using such ver0.11.0.

I would like to use all APIs with reference to the official seaborn gallery and functions.

If you are only interested in visualization, fly to the middle

Since it is long, please search with ctrl + F

Verification environment

conda create -n eda python==3.8 conda activate eda

notebook==6.1.4 ipykernel==5.3.4

seaborn==0.11.0 pandas==1.1.2 matplotlib==3.3.2 statsmodels==0.12.0 scipy==1.5.2 scikit-learn==0.23.2 numpy==1.19.2

Difference from previous version

・ Review of distribution visualization code ・ Color map adjustment ・ Addition of dis, hist, ecdf functions and functions -Review of kde and rug functions (deletion of parts calculated by stats models, smoothing can be adjusted by bw_adj, log_scale conversion can be selected as processing) -You can now select hist with jointplot ・ Supports both vertical and horizontal data ・ Above + minor changes

I used 0.10.1 before the new version, but set_theme was added before I knew it, and the setting became much easier.

Items deprecated by the update

・ Displot will disappear in the future, please move to the dis, hist function as soon as possible.

What is seaborn

A brief review of seaborn

seaborn is a wrapper for matplotlib, a package that draws code easily and beautifully. Matplot is running behind it, but the way to write the code becomes much simpler and you can express complicated figures in one line. Some functions also perform pre-processing-like filters and calculations, so I feel that they are also suitable for EDA.

seaborn thought

Introducing the interesting ideas of seaborn

seaborn designs drawing methods from the two perspectives of axes and figures. The grouping of the functions to be drawn is as shown in the figure, and detailed drawing (by the function on the axes side) can be collectively controlled by rel, dis, cat (function on the figure side). Of course, you can also use it by calling individual functions.

Isn't the script easier to understand if you draw with a small function?

In the function on the figure side that manages collectively, you can automatically generate the number of canvases you want to draw, perform grouping processing, and divide the drawing unit (facet).

Also, in the functions on the axes side (unless specified) ・ It is difficult to change the name of the axis ・ The legend is shown in the figure.

There are problems such as. If it is a function on the figure side, treat the axis and case law as different things, You can easily control labels and precedents in one line by using set_axis_labels and so on.

Visualization is better than that !!

The content of the introduction is roughly divided into two

· First options (color, axis, legend, etc.) ・ Mainly introduces rel, dis, cat on the major figure side ・ Introducing advanced drawing

I would like to divide it into

[Opt-0] Optional story

By getting to know the optional story in advance with just an overview Increase the ease of entering the story in the subsequent plots

Let's check if the latest version is included in the first place


import pandas as pd
from matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
print(sns.__version__)

0.11.0

[Opt-1] Call data

When you want to use famous data easily Since the data supported by seaborn is stored on github](https://github.com/mwaskom/seaborn-data), You can select the data you want to use and call it with load_dataset The called data can be manipulated in pandas dataframe format

penguins = sns.load_dataset("penguins")

[Opt-2] Set drawing theme

Select the theme you want to use while checking the current theme

sns.set_theme

<function seaborn.rcmod.set_theme(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)>

style controls the tint of the graph background

white, dark, whitegrid, darkgrid

pallet adjusts the color pattern of graph shapes

'Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r', 'BuGn', 'BuGn_r', 'BuPu', 'BuPu_r', 'CMRmap', 
'CMRmap_r', 'Dark2', 'Dark2_r', 'GnBu', 'GnBu_r', 'Greens', 'Greens_r', 'Greys', 'Greys_r', 'OrRd', 'OrRd_r', 
'Oranges', 'Oranges_r', 'PRGn', 'PRGn_r', 'Paired', 'Paired_r', 'Pastel1', 'Pastel1_r', 'Pastel2', 'Pastel2_r',
 'PiYG', 'PiYG_r', 'PuBu', 'PuBuGn', 'PuBuGn_r', 'PuBu_r', 'PuOr', 'PuOr_r', 'PuRd', 'PuRd_r', 'Purples', 
'Purples_r', 'RdBu', 'RdBu_r', 'RdGy', 'RdGy_r', 'RdPu', 'RdPu_r', 'RdYlBu', 'RdYlBu_r', 'RdYlGn', 'RdYlGn_r', 
'Reds', 'Reds_r', 'Set1', 'Set1_r', 'Set2', 'Set2_r', 'Set3', 'Set3_r', 'Spectral', 'Spectral_r', 'Wistia', 
'Wistia_r', 'YlGn', 'YlGnBu', 'YlGnBu_r', 'YlGn_r', 'YlOrBr', 'YlOrBr_r', 'YlOrRd', 'YlOrRd_r', 'afmhot', 
'afmhot_r', 'autumn', 'autumn_r', 'binary', 'binary_r', 'bone', 'bone_r', 'brg', 'brg_r', 'bwr', 'bwr_r', 
'cividis', 'cividis_r', 'cool', 'cool_r', 'coolwarm', 'coolwarm_r', 'copper', 'copper_r', 'crest', 'crest_r', 
'cubehelix', 'cubehelix_r', 'flag', 'flag_r', 'flare', 'flare_r', 'gist_earth', 'gist_earth_r', 'gist_gray', 
'gist_gray_r', 'gist_heat', 'gist_heat_r', 'gist_ncar', 'gist_ncar_r', 'gist_rainbow', 'gist_rainbow_r', 
'gist_stern', 'gist_stern_r', 'gist_yarg', 'gist_yarg_r', 'gnuplot', 'gnuplot2', 'gnuplot2_r', 'gnuplot_r', 
'gray', 'gray_r', 'hot', 'hot_r', 'hsv', 'hsv_r', 'icefire', 'icefire_r', 'inferno', 'inferno_r', 'jet',
 'jet_r', 'magma', 'magma_r', 'mako', 'mako_r', 'nipy_spectral', 'nipy_spectral_r', 'ocean', 'ocean_r',
 'pink', 'pink_r', 'plasma', 'plasma_r', 'prism', 'prism_r', 'rainbow', 'rainbow_r', 'rocket', 'rocket_r',
 'seismic', 'seismic_r', 'spring', 'spring_r', 'summer', 'summer_r', 'tab10', 'tab10_r', 'tab20', 'tab20_r',
 'tab20b', 'tab20b_r', 'tab20c', 'tab20c_r', 'terrain', 'terrain_r', 'turbo', 'turbo_r', 'twilight',
 'twilight_r', 'twilight_shifted', 'twilight_shifted_r', 'viridis', 'viridis_r', 'vlag', 'vlag_r',
 'winter', 'winter_r'

sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
sns.displot(df.flipper_length_mm)

[Opt-3] Change of axis label axis

g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")

You can overwrite the shape from the outside

Axis label can also be rotated

g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")
g.set_xticklabels(rotation=-45)

If you want to change the label spacing in 20 increments

g.set_xticklabels(step=20)

Other methods that can be used

[Opt-4] Legend drawing and color coding

Color coding is specified by hue in the function

In the axes function

df = sns.load_dataset("iris")
sns.scatterplot(data=df,x='sepal_length',y='sepal_width',hue='species')

The legend automatically goes inside, If you use the function on the figure side, it will automatically go out

df = sns.load_dataset("iris")
sns.relplot(data=df,x='sepal_length',y='sepal_width',hue='species',kind='scatter')

[Opt-5] Multiple drawing and legend

I want to draw a scatter plot, but sometimes I want to draw a scatter plot for each group Multiple drawing for data can be realized with FacetGrid Which axis to group by is specified by col (col is column instead of color) The number of drawing areas is automatically determined from the value of the qualitative variable of the axis to be grouped.

df = sns.load_dataset("penguins")
sns.FacetGrid(df,col='species')

Map to the created drawing area

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")

If you want a legend, add it later with add_legend

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()

If you want to add another drawing axis, you can divide it further by specifying it in the row of FacetGrid.

tips = sns.load_dataset("tips")

g = sns.FacetGrid(tips, col="time",  row="sex")
g.map(sns.scatterplot, "total_bill", "tip")

[Opt-6] I want a title for the whole

When giving a title to the whole instead of each grid, display it with suptitle Since matplotlib is running behind, you can also use suptitle

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()

g.fig.suptitle('suptitle',y=1.1,x=0,size=18)

[Opt-7] I want variables to be logarithmic as preprocessing

As it is, the range of values is too wide to understand

planets = sns.load_dataset("planets")
sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')

Pre-processing the value by specifying the scale

planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")

[Opt-8] I want you to erase the rod of the shaft

The background grid disappears in the theme, but the X and Y axis bars do not disappear

sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")

Erase with despine

sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")
g.despine(left=True, bottom=True)

[Opt-9] Specify xlim and ylim

sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
g=sns.displot(df.flipper_length_mm)
g.set(xlim=(0, 300), ylim=(0, 100))

Or set on the FacetGrid side

sns.FacetGrid(df,col='species',xlim=[0,10],ylim=[0,10])

[Main-0] Major plot introduction

[Main-1] scatter and line by relplot

[Main-1.1] Default rel

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r')
g.set(xscale="log", yscale="log")

When executed by default, it becomes scatter

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='scatter')
g.set(xscale="log", yscale="log")

Same even if scatter is specified by kind

[Main-1.2] rel kind specification

If you specify line with kind, it will connect the points and make it linear.

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='line')
g.set(xscale="log", yscale="log")

[Main-1.3] Change the size of scatter according to the value (bubble plot)

I want to change the shape of the scatter plot according to the value like a bubble plot In such a case, specify the data string containing the value in the size argument. Pass the upper and lower limits to sizes by list or tuple

planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300))
g.set(xscale="log", yscale="log")

[Main-1.4] Change the output size of the figure

Operate the size of the output figure from height

planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300),height=10)
g.set(xscale="log", yscale="log")

[Main-1.5] Confidence interval on line

Some data that is difficult to understand with scatter plots

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='scatter')

Easy to understand with line

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line')

The confidence interval is specified by ci. You can pass sd or a real number, and if it is a real number, it represents a 〇% confidence interval. sd uses the sd calculated from the observation site as it is

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line',ci=20)

Specified as a 20% confidence interval

[Main-1.6] Marker designation

You can also add markers for each event point and make confidence intervals sticks instead of areas.

sns.relplot(data=fmri, x="timepoint", y="signal", hue="event", err_style="bars", ci=95,markers=True,kind='line')

[Main-1.7] Increase the number of lines on the specified analysis axis

Like the size at the time of scatter introduced earlier, the analysis axis can be divided by style.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", style="event",kind='line')

Of course you can also specify size

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", size="event", kind='line')

[Main-1.8] Divide the screen by the specified analysis axis

If you want to divide the drawing screen itself, specify the axis you want to divide into col.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line')

When drawing with multiple axes, facet_kws can specify whether to share the x-axis and y-axis scales in each figure.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line',facet_kws=dict(sharey=False,sharex=False))

Be careful not to erase the label and get a surprise graph

If you really want to divide it in another way, you can use FacetGrid by specifying the function on the axes side instead of the function on the figure side.

fmri = sns.load_dataset("fmri")
g=sns.FacetGrid(fmri,col='event')
g.map_dataframe(sns.lineplot,data=fmri,x='timepoint',y='signal',hue="region")

[Main-2] hist, ked, ecdf and rug by displot

[Main-2.1] default hist of displot

The default is kind ='hist'

penguin = sns.load_dataset("penguins")
sns.displot(data=penguin,x='bill_depth_mm')
#sns.displot(data=penguin,x='bill_depth_mm',kind='hist')

You can change the expression method with the argument element The default is bar

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly')
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='step')

poly

step

You can also change the fineness of dividing by bins

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly',bins=100)

[Main-2.2] Add kde (kernel density estimate) to histogram

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',kde=True)

Of course, if you specify kind ='kde', hist will disappear.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde')

[Main-2.2] Try plotting with the original functions of hist and kde

sns.histplot(data=penguin,x='bill_depth_mm')

sns.kdeplot(data=penguin,x='bill_depth_mm')

[Main-2.3] Check the movement of kde

Bw_adjust to decide how much data width to look for when smoothing

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',bw_adjust=.2)

When 0.2

When 100

[Main-2.4] Change the color on another axis

As usual Color coded by hue Screen division by col

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island')

[Main-2.5] Make a stacked graph

Specify stack if you want to stack density functions Be careful not to misunderstand that there are more people on the side as you only get on the top

Can be used with hist or kde

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack")

Specify linewidth = 0 when you want to erase the boundary surface when stacking.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack",linewidth=0)

In addition, if you specify edge color = "0.1", you can strengthen the border line.

Set fill to False when you want to remove the color inside in the stacked graph The graph has finally become a misunderstanding

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="stack",fill=False)

[Main-2.6] About alpha, not stacking, watermarking

You can adjust the color sheer by adjusting alpha

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',fill=True,alpha=0.5)

[Main-2.7] About multiple, express as area ratio, cover, put sideways

By specifying fill instead of stack, it will be possible to plot which ratio is larger in the area where the whole is 1.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="fill")

See the changes with hist

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="fill")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="layer")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="dodge")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="stack")

fill

layer

dodge

stack

[Main-2.8] 2D plot

Both hist and kde can be made into a two-dimensional plot by specifying x and y. As an image, the contour lines of the map

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='hist')
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='kde')

Color coding is hue Fill is fill Screen division is col There is no change in the rule

[Main-2.9] Two-dimensional color gradation

If you want to add a color gradation, specify the pattern in cmap

Fills unvalued (low probability) areas thresh = 0 (positive value less than 1) Adjust the fineness of the gradation stage with the specified levels

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde",fill=True, thresh=0, levels=10, cmap='cubehelix')

thresh=0, levels=10

thresh=0.8, levels=10

thresh=0, levels=100

[Main-2.10] What is rug plot?

A rug is like a fine beard on the side of the shaft. You can check how dense the rug is by looking at the density with kde

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde", rug=True)

Can be combined with a scatter plot

sns.scatterplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")
sns.rugplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")

[Main-2.11] Plot sideways

Speaking of histgram, it is an image of a stacked graph with values on the x-axis, By specifying a value for y, it automatically draws horizontally

sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')

[Main-2.12] Specifying a small plot title

I talked about the whole title at the time of option To change each small title, it is also possible to put graph information in g once and fetch it from col_name (See set_titles for details)

g=sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')
g.set_titles("{col_name} penguins")

[Main-2.13] ecdf cumulative plot

Can be output as cumulative probability

sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf")
sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf",complementary=True)

complementary=True

Can be used for survival time analysis, etc.

[Main-2.14] col_wrap that divides the drawing screen into multiple parts and log_scale that performs logarithmic processing

displot also has a function that allows you to specify whether to perform log processing internally. It is also possible to logarithmically process only one axis

Drawing is usually done with col and it is divided naturally, 〇 You can also specify that you want to display in columns

diamonds = sns.load_dataset("diamonds")
sns.displot(data=diamonds, x="depth", y="price", log_scale=(True, False), col="clarity",col_wrap=5,kind='kde')

[Main-3] strip, swarm, box.box, violin, point, bar, boxen, count by catplot

[Main-3.1] default of catplot

Usually specified for strip

Since it is a visualization tool suitable for categories, it corresponds to qualitative variables.

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='strip')

Let's look at the types of kind

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='box')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='boxen')

sns.catplot(data=penguin,x='species',height=6,kind='count')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='bar')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='violin')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='swarm')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='point')

box

boxen

count

bar (see the y-axis of count)

violin

swarm

point Can be used for group comparison, analysis of variance, etc.

I think you could see the above and understand that there is a confidence interval. Can be set with ci

[Main-3.2] Expression of box and violin

Dodge whether to draw in the same column or separate

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=False,height=6)

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=True,height=6)

Same for swarm and violin

If you set dodge to False in bar, it will be a confusing figure

I'm not really riding, I'm hiding behind

split can be selected for violin

sns.catplot(data=penguin, kind="violin",x="species", y="body_mass_g", hue="sex", split=True)

You can swap y and x to turn it sideways

sns.catplot(data=penguin, kind="violin",y="species", x="body_mass_g", hue="sex", split=True)

[Main-3.3] swarm and box and violin

swarm tends to have many points When it cannot be drawn, it warns that the point that should be drawn is not created. Easy

There is also a function that overlays it by adding it after the box or violin.

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g",height=6)
sns.swarmplot(data=penguin,x="species", y="body_mass_g",hue='sex',palette="Set1")

[Adv] Advanced usage

【adv-1】pair plot

Try using a convenient pair to get a bird's eye view of the entire data frame

If you call the pairplot function, you will often see the figure.

penguin = sns.load_dataset("penguins")
sns.pairplot(penguin)

sns.pairplot(penguin, hue="species)

With PairGrid, you can specify the upper and lower plot formats again.

penguin = sns.load_dataset("penguins")

g = sns.PairGrid(penguin, diag_sharey=False)
g.map_upper(sns.scatterplot, s=15)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot, lw=2)

【adv-2】heat map

Returns a heatmap for matrix data

flights = sns.load_dataset("flights")
flights = flights.pivot("month","year", "passengers")
sns.heatmap(flights)

If you want to put the numbers together, set annot to True and use fmt so that the numbers do not go wild.

sns.heatmap(flights, annot=True, fmt="d")

When you want to write boundaries, specify with linewidths

sns.heatmap(flights, linewidths=.5)

Using triu_indices_from, which creates a numpy triangular matrix, By putting a matrix like the one created by T, F 1,0 in the mask, Can be shaped and output

corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(7, 5))
    ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True)

【adv-3】joint plot

Check the distribution of each variable while looking at the relationship between the two variables as the density.

sns.jointplot(x='bill_length_mm', y='bill_depth_mm', data=penguin)

This also has a type and can be specified by kind The default above is scatter

kde

hist

hex

reg(regression)

plot for resid model residual confirmation (see below)

You can also plot while thinking about combinations by using jointgrid.

Arbitrarily specify the main and side views

g=sns.JointGrid(x='bill_length_mm', y='bill_depth_mm', data=penguin)
g.plot(sns.regplot, sns.kdeplot)

Add rugs, add kde to scatter plots, etc. Add using plot_joint and marginals

g = sns.jointplot(data=penguin, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.kdeplot, color="r", zorder=0, levels=6)
g.plot_marginals(sns.rugplot, color="r", height=-.15, clip_on=False)

Create any combination by specifying main, top, right

g = sns.JointGrid()
x, y = penguin["bill_length_mm"], penguin["bill_depth_mm"]
sns.scatterplot(x=x, y=y, ec="b", fc="none", s=100, linewidth=1.5, ax=g.ax_joint)
sns.histplot(x=x, fill=False, linewidth=2, ax=g.ax_marg_x)
sns.kdeplot(y=y, linewidth=2, ax=g.ax_marg_y)

【adv-4】linear model plot

Apparently, the combination of Facet and regplot looks like lmplot, so basically I will explain it with lmplot. It seems to be regplot that works with pairplot and jointplot

Visualize linear regression

sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin)

Color coded by hue

Determine the degree of polynomial regression with order

sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin, order=5)

There is also an option to do logistic

penguin['male'] = pd.get_dummies(penguin.sex)['Male']
sns.lmplot(x='bill_length_mm', y='male',data=penguin, logistic=True)

After making a model, I want to compare the residuals Check the residuals to see how much the drawn lm model (linear or polynomial) can explain the data. As a judgment of the linear model, if the residuals seem to follow a normal distribution, the model can be judged to be a good model to some extent.

Probably because of the relationship between reg and lm Is the background function running on "resid" of jointplot residplot?

sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=1,kind='resid')
sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=10,kind='resid')

First-order residuals

10th order residual

Compare with sklearn model

import sklearn
from sklearn import datasets,linear_model

penguin=penguin.dropna(how='any')
model = linear_model.LinearRegression()

X = np.array(penguin["bill_length_mm"]).reshape(-1, 1)
Y = np.array(penguin["body_mass_g"]).reshape(-1, 1)
model.fit(X, Y)
pred_y = model.predict(X)

plt.scatter(x=X, y=Y-pred_y)

It seems that he will calculate the residuals.

【adv-5】clustermap

iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)

Visualizes hierarchical clustering Specify by method which distance criteria to group

This area is difficult to explain, so it is faster if you learn hierarchical clustering. method is obtained from scipy calculation [scipy.cluster.hierarchy.linkage](https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy .linkage )

mathod

single
complete
average
weighted
centroid
median
ward

single

Finally

We plan to introduce more visualization methods Even if the code is dirty, I want to be careful about using the wrong figure as much as possible

reference

Release news-v0.11.0 (September 2020)

API reference

Announcing the release of seaborn 0.11

Overview of seaborn plotting functions

[PYTHON] Try all the latest seaborn APIs to improve your data visualization skills [Visualization Uncle, 2020/9 / 9-ver 0.11.0]