[PYTHON] Try all the latest seaborn APIs to improve your data visualization skills [Visualization Uncle, 2020/9 / 9-ver 0.11.0]

A new version of seaborn came out on September 9, 2020. This time, I will look back on the visualization while using such ver0.11.0.

I would like to use all APIs with reference to the official seaborn gallery and functions.

If you are only interested in visualization, fly to the middle

Since it is long, please search with ctrl + F

Verification environment

conda create -n eda python==3.8 conda activate eda

notebook==6.1.4 ipykernel==5.3.4

seaborn==0.11.0 pandas==1.1.2 matplotlib==3.3.2 statsmodels==0.12.0 scipy==1.5.2 scikit-learn==0.23.2 numpy==1.19.2

Difference from previous version

・ Review of distribution visualization code ・ Color map adjustment ・ Addition of dis, hist, ecdf functions and functions -Review of kde and rug functions (deletion of parts calculated by stats models, smoothing can be adjusted by bw_adj, log_scale conversion can be selected as processing) -You can now select hist with jointplot ・ Supports both vertical and horizontal data ・ Above + minor changes

I used 0.10.1 before the new version, but set_theme was added before I knew it, and the setting became much easier.

Items deprecated by the update

・ Displot will disappear in the future, please move to the dis, hist function as soon as possible.

What is seaborn

A brief review of seaborn

seaborn is a wrapper for matplotlib, a package that draws code easily and beautifully. Matplot is running behind it, but the way to write the code becomes much simpler and you can express complicated figures in one line. Some functions also perform pre-processing-like filters and calculations, so I feel that they are also suitable for EDA.

seaborn thought

Introducing the interesting ideas of seaborn

image.png

seaborn designs drawing methods from the two perspectives of axes and figures. The grouping of the functions to be drawn is as shown in the figure, and detailed drawing (by the function on the axes side) can be collectively controlled by rel, dis, cat (function on the figure side). Of course, you can also use it by calling individual functions.

Isn't the script easier to understand if you draw with a small function?

In the function on the figure side that manages collectively, you can automatically generate the number of canvases you want to draw, perform grouping processing, and divide the drawing unit (facet).

image.png

Also, in the functions on the axes side (unless specified) ・ It is difficult to change the name of the axis ・ The legend is shown in the figure.

There are problems such as. If it is a function on the figure side, treat the axis and case law as different things, You can easily control labels and precedents in one line by using set_axis_labels and so on.

Visualization is better than that !!

The content of the introduction is roughly divided into two

· First options (color, axis, legend, etc.) ・ Mainly introduces rel, dis, cat on the major figure side ・ Introducing advanced drawing

I would like to divide it into

[Opt-0] Optional story

By getting to know the optional story in advance with just an overview Increase the ease of entering the story in the subsequent plots

Let's check if the latest version is included in the first place


import pandas as pd
from matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
print(sns.__version__)

0.11.0

[Opt-1] Call data

When you want to use famous data easily Since the data supported by seaborn is stored on github](https://github.com/mwaskom/seaborn-data), You can select the data you want to use and call it with load_dataset The called data can be manipulated in pandas dataframe format

penguins = sns.load_dataset("penguins")

[Opt-2] Set drawing theme

Select the theme you want to use while checking the current theme

sns.set_theme

<function seaborn.rcmod.set_theme(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)>

style controls the tint of the graph background

white, dark, whitegrid, darkgrid

pallet adjusts the color pattern of graph shapes

'Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r', 'BuGn', 'BuGn_r', 'BuPu', 'BuPu_r', 'CMRmap', 
'CMRmap_r', 'Dark2', 'Dark2_r', 'GnBu', 'GnBu_r', 'Greens', 'Greens_r', 'Greys', 'Greys_r', 'OrRd', 'OrRd_r', 
'Oranges', 'Oranges_r', 'PRGn', 'PRGn_r', 'Paired', 'Paired_r', 'Pastel1', 'Pastel1_r', 'Pastel2', 'Pastel2_r',
 'PiYG', 'PiYG_r', 'PuBu', 'PuBuGn', 'PuBuGn_r', 'PuBu_r', 'PuOr', 'PuOr_r', 'PuRd', 'PuRd_r', 'Purples', 
'Purples_r', 'RdBu', 'RdBu_r', 'RdGy', 'RdGy_r', 'RdPu', 'RdPu_r', 'RdYlBu', 'RdYlBu_r', 'RdYlGn', 'RdYlGn_r', 
'Reds', 'Reds_r', 'Set1', 'Set1_r', 'Set2', 'Set2_r', 'Set3', 'Set3_r', 'Spectral', 'Spectral_r', 'Wistia', 
'Wistia_r', 'YlGn', 'YlGnBu', 'YlGnBu_r', 'YlGn_r', 'YlOrBr', 'YlOrBr_r', 'YlOrRd', 'YlOrRd_r', 'afmhot', 
'afmhot_r', 'autumn', 'autumn_r', 'binary', 'binary_r', 'bone', 'bone_r', 'brg', 'brg_r', 'bwr', 'bwr_r', 
'cividis', 'cividis_r', 'cool', 'cool_r', 'coolwarm', 'coolwarm_r', 'copper', 'copper_r', 'crest', 'crest_r', 
'cubehelix', 'cubehelix_r', 'flag', 'flag_r', 'flare', 'flare_r', 'gist_earth', 'gist_earth_r', 'gist_gray', 
'gist_gray_r', 'gist_heat', 'gist_heat_r', 'gist_ncar', 'gist_ncar_r', 'gist_rainbow', 'gist_rainbow_r', 
'gist_stern', 'gist_stern_r', 'gist_yarg', 'gist_yarg_r', 'gnuplot', 'gnuplot2', 'gnuplot2_r', 'gnuplot_r', 
'gray', 'gray_r', 'hot', 'hot_r', 'hsv', 'hsv_r', 'icefire', 'icefire_r', 'inferno', 'inferno_r', 'jet',
 'jet_r', 'magma', 'magma_r', 'mako', 'mako_r', 'nipy_spectral', 'nipy_spectral_r', 'ocean', 'ocean_r',
 'pink', 'pink_r', 'plasma', 'plasma_r', 'prism', 'prism_r', 'rainbow', 'rainbow_r', 'rocket', 'rocket_r',
 'seismic', 'seismic_r', 'spring', 'spring_r', 'summer', 'summer_r', 'tab10', 'tab10_r', 'tab20', 'tab20_r',
 'tab20b', 'tab20b_r', 'tab20c', 'tab20c_r', 'terrain', 'terrain_r', 'turbo', 'turbo_r', 'twilight',
 'twilight_r', 'twilight_shifted', 'twilight_shifted_r', 'viridis', 'viridis_r', 'vlag', 'vlag_r',
 'winter', 'winter_r'
sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
sns.displot(df.flipper_length_mm)

image.png

[Opt-3] Change of axis label axis

g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")

image.png

You can overwrite the shape from the outside

Axis label can also be rotated

g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")
g.set_xticklabels(rotation=-45)

image.png

If you want to change the label spacing in 20 increments

g.set_xticklabels(step=20)

Other methods that can be used

[Opt-4] Legend drawing and color coding

Color coding is specified by hue in the function

In the axes function

df = sns.load_dataset("iris")
sns.scatterplot(data=df,x='sepal_length',y='sepal_width',hue='species')

image.png

The legend automatically goes inside, If you use the function on the figure side, it will automatically go out

df = sns.load_dataset("iris")
sns.relplot(data=df,x='sepal_length',y='sepal_width',hue='species',kind='scatter')

image.png

[Opt-5] Multiple drawing and legend

I want to draw a scatter plot, but sometimes I want to draw a scatter plot for each group Multiple drawing for data can be realized with FacetGrid Which axis to group by is specified by col (col is column instead of color) The number of drawing areas is automatically determined from the value of the qualitative variable of the axis to be grouped.

df = sns.load_dataset("penguins")
sns.FacetGrid(df,col='species')

image.png

Map to the created drawing area

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")

image.png

If you want a legend, add it later with add_legend

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()

image.png

If you want to add another drawing axis, you can divide it further by specifying it in the row of FacetGrid.

tips = sns.load_dataset("tips")

g = sns.FacetGrid(tips, col="time",  row="sex")
g.map(sns.scatterplot, "total_bill", "tip")

image.png

[Opt-6] I want a title for the whole

When giving a title to the whole instead of each grid, display it with suptitle Since matplotlib is running behind, you can also use suptitle

df = sns.load_dataset("penguins")

g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()

g.fig.suptitle('suptitle',y=1.1,x=0,size=18)

image.png

[Opt-7] I want variables to be logarithmic as preprocessing

As it is, the range of values is too wide to understand

planets = sns.load_dataset("planets")
sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')

image.png

Pre-processing the value by specifying the scale

planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")

image.png

[Opt-8] I want you to erase the rod of the shaft

The background grid disappears in the theme, but the X and Y axis bars do not disappear

sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")

image.png

Erase with despine

sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")
g.despine(left=True, bottom=True)

image.png

[Opt-9] Specify xlim and ylim

sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
g=sns.displot(df.flipper_length_mm)
g.set(xlim=(0, 300), ylim=(0, 100))

image.png

Or set on the FacetGrid side

sns.FacetGrid(df,col='species',xlim=[0,10],ylim=[0,10])

image.png

[Main-0] Major plot introduction

[Main-1] scatter and line by relplot

[Main-1.1] Default rel

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r')
g.set(xscale="log", yscale="log")

When executed by default, it becomes scatter

image.png

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='scatter')
g.set(xscale="log", yscale="log")

Same even if scatter is specified by kind

image.png

[Main-1.2] rel kind specification

If you specify line with kind, it will connect the points and make it linear.

sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='line')
g.set(xscale="log", yscale="log")

image.png

[Main-1.3] Change the size of scatter according to the value (bubble plot)

I want to change the shape of the scatter plot according to the value like a bubble plot In such a case, specify the data string containing the value in the size argument. Pass the upper and lower limits to sizes by list or tuple

planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300))
g.set(xscale="log", yscale="log")

image.png

[Main-1.4] Change the output size of the figure

Operate the size of the output figure from height

planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300),height=10)
g.set(xscale="log", yscale="log")

image.png

[Main-1.5] Confidence interval on line

Some data that is difficult to understand with scatter plots

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='scatter')

image.png

Easy to understand with line

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line')

image.png

The confidence interval is specified by ci. You can pass sd or a real number, and if it is a real number, it represents a 〇% confidence interval. sd uses the sd calculated from the observation site as it is

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line',ci=20)

Specified as a 20% confidence interval

image.png

[Main-1.6] Marker designation

You can also add markers for each event point and make confidence intervals sticks instead of areas.

sns.relplot(data=fmri, x="timepoint", y="signal", hue="event", err_style="bars", ci=95,markers=True,kind='line')

image.png

[Main-1.7] Increase the number of lines on the specified analysis axis

Like the size at the time of scatter introduced earlier, the analysis axis can be divided by style.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", style="event",kind='line')

image.png

Of course you can also specify size

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", size="event", kind='line')

image.png

[Main-1.8] Divide the screen by the specified analysis axis

If you want to divide the drawing screen itself, specify the axis you want to divide into col.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line')

image.png

When drawing with multiple axes, facet_kws can specify whether to share the x-axis and y-axis scales in each figure.

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line',facet_kws=dict(sharey=False,sharex=False))

Be careful not to erase the label and get a surprise graph

image.png

If you really want to divide it in another way, you can use FacetGrid by specifying the function on the axes side instead of the function on the figure side.

fmri = sns.load_dataset("fmri")
g=sns.FacetGrid(fmri,col='event')
g.map_dataframe(sns.lineplot,data=fmri,x='timepoint',y='signal',hue="region")

image.png

[Main-2] hist, ked, ecdf and rug by displot

[Main-2.1] default hist of displot

The default is kind ='hist'

penguin = sns.load_dataset("penguins")
sns.displot(data=penguin,x='bill_depth_mm')
#sns.displot(data=penguin,x='bill_depth_mm',kind='hist')

image.png

You can change the expression method with the argument element The default is bar

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly')
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='step')

poly

image.png

step

image.png

You can also change the fineness of dividing by bins

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly',bins=100)

image.png

[Main-2.2] Add kde (kernel density estimate) to histogram

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',kde=True)

image.png

Of course, if you specify kind ='kde', hist will disappear.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde')

image.png

[Main-2.2] Try plotting with the original functions of hist and kde

sns.histplot(data=penguin,x='bill_depth_mm')

image.png

sns.kdeplot(data=penguin,x='bill_depth_mm')

image.png

[Main-2.3] Check the movement of kde

Bw_adjust to decide how much data width to look for when smoothing

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',bw_adjust=.2)

When 0.2

image.png

When 100

image.png

[Main-2.4] Change the color on another axis

As usual Color coded by hue Screen division by col

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island')

image.png

[Main-2.5] Make a stacked graph

Specify stack if you want to stack density functions Be careful not to misunderstand that there are more people on the side as you only get on the top

Can be used with hist or kde

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack")

image.png

Specify linewidth = 0 when you want to erase the boundary surface when stacking.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack",linewidth=0)

image.png

In addition, if you specify edge color = "0.1", you can strengthen the border line.

Set fill to False when you want to remove the color inside in the stacked graph The graph has finally become a misunderstanding

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="stack",fill=False)

image.png

[Main-2.6] About alpha, not stacking, watermarking

You can adjust the color sheer by adjusting alpha

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',fill=True,alpha=0.5)

image.png

[Main-2.7] About multiple, express as area ratio, cover, put sideways

By specifying fill instead of stack, it will be possible to plot which ratio is larger in the area where the whole is 1.

sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="fill")

image.png

See the changes with hist

sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="fill")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="layer")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="dodge")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="stack")

fill

image.png

layer

image.png

dodge

image.png

stack

image.png

[Main-2.8] 2D plot

Both hist and kde can be made into a two-dimensional plot by specifying x and y. As an image, the contour lines of the map

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='hist')
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='kde')

image.png

image.png

Color coding is hue Fill is fill Screen division is col There is no change in the rule

image.png

[Main-2.9] Two-dimensional color gradation

If you want to add a color gradation, specify the pattern in cmap

Fills unvalued (low probability) areas thresh = 0 (positive value less than 1) Adjust the fineness of the gradation stage with the specified levels

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde",fill=True, thresh=0, levels=10, cmap='cubehelix')

thresh=0, levels=10

image.png

thresh=0.8, levels=10

image.png

thresh=0, levels=100

image.png

[Main-2.10] What is rug plot?

A rug is like a fine beard on the side of the shaft. You can check how dense the rug is by looking at the density with kde

sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde", rug=True)

image.png

Can be combined with a scatter plot

sns.scatterplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")
sns.rugplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")

image.png

[Main-2.11] Plot sideways

Speaking of histgram, it is an image of a stacked graph with values on the x-axis, By specifying a value for y, it automatically draws horizontally

sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')

image.png

[Main-2.12] Specifying a small plot title

I talked about the whole title at the time of option To change each small title, it is also possible to put graph information in g once and fetch it from col_name (See set_titles for details)

g=sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')
g.set_titles("{col_name} penguins")

image.png

[Main-2.13] ecdf cumulative plot

Can be output as cumulative probability

sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf")
sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf",complementary=True)

image.png

complementary=True

image.png

Can be used for survival time analysis, etc.

[Main-2.14] col_wrap that divides the drawing screen into multiple parts and log_scale that performs logarithmic processing

displot also has a function that allows you to specify whether to perform log processing internally. It is also possible to logarithmically process only one axis

Drawing is usually done with col and it is divided naturally, 〇 You can also specify that you want to display in columns

diamonds = sns.load_dataset("diamonds")
sns.displot(data=diamonds, x="depth", y="price", log_scale=(True, False), col="clarity",col_wrap=5,kind='kde')

image.png

[Main-3] strip, swarm, box.box, violin, point, bar, boxen, count by catplot

[Main-3.1] default of catplot

Usually specified for strip

Since it is a visualization tool suitable for categories, it corresponds to qualitative variables.

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='strip')

image.png

Let's look at the types of kind

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='box')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='boxen')

sns.catplot(data=penguin,x='species',height=6,kind='count')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='bar')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='violin')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='swarm')

sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='point')

box

image.png

boxen

image.png

count

image.png

bar (see the y-axis of count)

image.png

violin

image.png

swarm

image.png

point Can be used for group comparison, analysis of variance, etc.

image.png

I think you could see the above and understand that there is a confidence interval. Can be set with ci

[Main-3.2] Expression of box and violin

Dodge whether to draw in the same column or separate

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=False,height=6)

image.png

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=True,height=6)

image.png

Same for swarm and violin

image.png

image.png

If you set dodge to False in bar, it will be a confusing figure

image.png

I'm not really riding, I'm hiding behind

image.png

split can be selected for violin

sns.catplot(data=penguin, kind="violin",x="species", y="body_mass_g", hue="sex", split=True)

image.png

You can swap y and x to turn it sideways

sns.catplot(data=penguin, kind="violin",y="species", x="body_mass_g", hue="sex", split=True)

image.png

[Main-3.3] swarm and box and violin

swarm tends to have many points When it cannot be drawn, it warns that the point that should be drawn is not created. Easy

image.png

There is also a function that overlays it by adding it after the box or violin.

sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g",height=6)
sns.swarmplot(data=penguin,x="species", y="body_mass_g",hue='sex',palette="Set1")

image.png

[Adv] Advanced usage

【adv-1】pair plot

Try using a convenient pair to get a bird's eye view of the entire data frame

If you call the pairplot function, you will often see the figure.

penguin = sns.load_dataset("penguins")
sns.pairplot(penguin)

sns.pairplot(penguin, hue="species)

image.png

image.png

With PairGrid, you can specify the upper and lower plot formats again.

penguin = sns.load_dataset("penguins")

g = sns.PairGrid(penguin, diag_sharey=False)
g.map_upper(sns.scatterplot, s=15)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot, lw=2)

image.png

【adv-2】heat map

Returns a heatmap for matrix data

flights = sns.load_dataset("flights")
flights = flights.pivot("month","year", "passengers")
sns.heatmap(flights)

image.png

If you want to put the numbers together, set annot to True and use fmt so that the numbers do not go wild.

sns.heatmap(flights, annot=True, fmt="d")

image.png

When you want to write boundaries, specify with linewidths

sns.heatmap(flights, linewidths=.5)

image.png

Using triu_indices_from, which creates a numpy triangular matrix, By putting a matrix like the one created by T, F 1,0 in the mask, Can be shaped and output

corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(7, 5))
    ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True)

image.png

【adv-3】joint plot

Check the distribution of each variable while looking at the relationship between the two variables as the density.

sns.jointplot(x='bill_length_mm', y='bill_depth_mm', data=penguin)

image.png

This also has a type and can be specified by kind The default above is scatter

kde

image.png

hist

image.png

hex

image.png

reg(regression)

image.png

plot for resid model residual confirmation (see below)

image.png

You can also plot while thinking about combinations by using jointgrid.

Arbitrarily specify the main and side views

g=sns.JointGrid(x='bill_length_mm', y='bill_depth_mm', data=penguin)
g.plot(sns.regplot, sns.kdeplot)

image.png

Add rugs, add kde to scatter plots, etc. Add using plot_joint and marginals

g = sns.jointplot(data=penguin, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.kdeplot, color="r", zorder=0, levels=6)
g.plot_marginals(sns.rugplot, color="r", height=-.15, clip_on=False)

image.png

Create any combination by specifying main, top, right

g = sns.JointGrid()
x, y = penguin["bill_length_mm"], penguin["bill_depth_mm"]
sns.scatterplot(x=x, y=y, ec="b", fc="none", s=100, linewidth=1.5, ax=g.ax_joint)
sns.histplot(x=x, fill=False, linewidth=2, ax=g.ax_marg_x)
sns.kdeplot(y=y, linewidth=2, ax=g.ax_marg_y)

image.png

【adv-4】linear model plot

Apparently, the combination of Facet and regplot looks like lmplot, so basically I will explain it with lmplot. It seems to be regplot that works with pairplot and jointplot

Visualize linear regression

sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin)

image.png

Color coded by hue

image.png

Determine the degree of polynomial regression with order

sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin, order=5)

image.png

There is also an option to do logistic

penguin['male'] = pd.get_dummies(penguin.sex)['Male']
sns.lmplot(x='bill_length_mm', y='male',data=penguin, logistic=True)

After making a model, I want to compare the residuals Check the residuals to see how much the drawn lm model (linear or polynomial) can explain the data. As a judgment of the linear model, if the residuals seem to follow a normal distribution, the model can be judged to be a good model to some extent.

Probably because of the relationship between reg and lm Is the background function running on "resid" of jointplot residplot?

sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=1,kind='resid')
sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=10,kind='resid')

First-order residuals

image.png

10th order residual

image.png

Compare with sklearn model
import sklearn
from sklearn import datasets,linear_model

penguin=penguin.dropna(how='any')
model = linear_model.LinearRegression()

X = np.array(penguin["bill_length_mm"]).reshape(-1, 1)
Y = np.array(penguin["body_mass_g"]).reshape(-1, 1)
model.fit(X, Y)
pred_y = model.predict(X)

plt.scatter(x=X, y=Y-pred_y)

image.png

It seems that he will calculate the residuals.

【adv-5】clustermap

iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)

Visualizes hierarchical clustering Specify by method which distance criteria to group

This area is difficult to explain, so it is faster if you learn hierarchical clustering. method is obtained from scipy calculation [scipy.cluster.hierarchy.linkage](https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy .linkage )

image.png

mathod

single
complete
average
weighted
centroid
median
ward

single

image.png

Finally

We plan to introduce more visualization methods Even if the code is dirty, I want to be careful about using the wrong figure as much as possible

reference

Release news-v0.11.0 (September 2020)

API reference

Announcing the release of seaborn 0.11

Overview of seaborn plotting functions

Recommended Posts

Try all the latest seaborn APIs to improve your data visualization skills [Visualization Uncle, 2020/9 / 9-ver 0.11.0]
Try to improve your own intro quiz in Python
Try to decipher the login data stored in Firefox