A new version of seaborn came out on September 9, 2020. This time, I will look back on the visualization while using such ver0.11.0.
I would like to use all APIs with reference to the official seaborn gallery and functions.
If you are only interested in visualization, fly to the middle
Since it is long, please search with ctrl + F
conda create -n eda python==3.8 conda activate eda
notebook==6.1.4 ipykernel==5.3.4
seaborn==0.11.0 pandas==1.1.2 matplotlib==3.3.2 statsmodels==0.12.0 scipy==1.5.2 scikit-learn==0.23.2 numpy==1.19.2
・ Review of distribution visualization code ・ Color map adjustment ・ Addition of dis, hist, ecdf functions and functions -Review of kde and rug functions (deletion of parts calculated by stats models, smoothing can be adjusted by bw_adj, log_scale conversion can be selected as processing) -You can now select hist with jointplot ・ Supports both vertical and horizontal data ・ Above + minor changes
I used 0.10.1 before the new version, but set_theme was added before I knew it, and the setting became much easier.
・ Displot will disappear in the future, please move to the dis, hist function as soon as possible.
seaborn is a wrapper for matplotlib, a package that draws code easily and beautifully. Matplot is running behind it, but the way to write the code becomes much simpler and you can express complicated figures in one line. Some functions also perform pre-processing-like filters and calculations, so I feel that they are also suitable for EDA.
Introducing the interesting ideas of seaborn
seaborn designs drawing methods from the two perspectives of axes and figures. The grouping of the functions to be drawn is as shown in the figure, and detailed drawing (by the function on the axes side) can be collectively controlled by rel, dis, cat (function on the figure side). Of course, you can also use it by calling individual functions.
In the function on the figure side that manages collectively, you can automatically generate the number of canvases you want to draw, perform grouping processing, and divide the drawing unit (facet).
Also, in the functions on the axes side (unless specified) ・ It is difficult to change the name of the axis ・ The legend is shown in the figure.
There are problems such as. If it is a function on the figure side, treat the axis and case law as different things, You can easily control labels and precedents in one line by using set_axis_labels and so on.
The content of the introduction is roughly divided into two
· First options (color, axis, legend, etc.) ・ Mainly introduces rel, dis, cat on the major figure side ・ Introducing advanced drawing
I would like to divide it into
By getting to know the optional story in advance with just an overview Increase the ease of entering the story in the subsequent plots
Let's check if the latest version is included in the first place
import pandas as pd
from matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print(sns.__version__)
0.11.0
When you want to use famous data easily Since the data supported by seaborn is stored on github](https://github.com/mwaskom/seaborn-data), You can select the data you want to use and call it with load_dataset The called data can be manipulated in pandas dataframe format
penguins = sns.load_dataset("penguins")
Select the theme you want to use while checking the current theme
sns.set_theme
<function seaborn.rcmod.set_theme(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)>
style controls the tint of the graph background
white, dark, whitegrid, darkgrid
pallet adjusts the color pattern of graph shapes
'Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r', 'BuGn', 'BuGn_r', 'BuPu', 'BuPu_r', 'CMRmap',
'CMRmap_r', 'Dark2', 'Dark2_r', 'GnBu', 'GnBu_r', 'Greens', 'Greens_r', 'Greys', 'Greys_r', 'OrRd', 'OrRd_r',
'Oranges', 'Oranges_r', 'PRGn', 'PRGn_r', 'Paired', 'Paired_r', 'Pastel1', 'Pastel1_r', 'Pastel2', 'Pastel2_r',
'PiYG', 'PiYG_r', 'PuBu', 'PuBuGn', 'PuBuGn_r', 'PuBu_r', 'PuOr', 'PuOr_r', 'PuRd', 'PuRd_r', 'Purples',
'Purples_r', 'RdBu', 'RdBu_r', 'RdGy', 'RdGy_r', 'RdPu', 'RdPu_r', 'RdYlBu', 'RdYlBu_r', 'RdYlGn', 'RdYlGn_r',
'Reds', 'Reds_r', 'Set1', 'Set1_r', 'Set2', 'Set2_r', 'Set3', 'Set3_r', 'Spectral', 'Spectral_r', 'Wistia',
'Wistia_r', 'YlGn', 'YlGnBu', 'YlGnBu_r', 'YlGn_r', 'YlOrBr', 'YlOrBr_r', 'YlOrRd', 'YlOrRd_r', 'afmhot',
'afmhot_r', 'autumn', 'autumn_r', 'binary', 'binary_r', 'bone', 'bone_r', 'brg', 'brg_r', 'bwr', 'bwr_r',
'cividis', 'cividis_r', 'cool', 'cool_r', 'coolwarm', 'coolwarm_r', 'copper', 'copper_r', 'crest', 'crest_r',
'cubehelix', 'cubehelix_r', 'flag', 'flag_r', 'flare', 'flare_r', 'gist_earth', 'gist_earth_r', 'gist_gray',
'gist_gray_r', 'gist_heat', 'gist_heat_r', 'gist_ncar', 'gist_ncar_r', 'gist_rainbow', 'gist_rainbow_r',
'gist_stern', 'gist_stern_r', 'gist_yarg', 'gist_yarg_r', 'gnuplot', 'gnuplot2', 'gnuplot2_r', 'gnuplot_r',
'gray', 'gray_r', 'hot', 'hot_r', 'hsv', 'hsv_r', 'icefire', 'icefire_r', 'inferno', 'inferno_r', 'jet',
'jet_r', 'magma', 'magma_r', 'mako', 'mako_r', 'nipy_spectral', 'nipy_spectral_r', 'ocean', 'ocean_r',
'pink', 'pink_r', 'plasma', 'plasma_r', 'prism', 'prism_r', 'rainbow', 'rainbow_r', 'rocket', 'rocket_r',
'seismic', 'seismic_r', 'spring', 'spring_r', 'summer', 'summer_r', 'tab10', 'tab10_r', 'tab20', 'tab20_r',
'tab20b', 'tab20b_r', 'tab20c', 'tab20c_r', 'terrain', 'terrain_r', 'turbo', 'turbo_r', 'twilight',
'twilight_r', 'twilight_shifted', 'twilight_shifted_r', 'viridis', 'viridis_r', 'vlag', 'vlag_r',
'winter', 'winter_r'
sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
sns.displot(df.flipper_length_mm)
g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")
You can overwrite the shape from the outside
Axis label can also be rotated
g = sns.displot(df.flipper_length_mm)
g.set_axis_labels("Xaxis", "Yaxis")
g.set_xticklabels(rotation=-45)
If you want to change the label spacing in 20 increments
g.set_xticklabels(step=20)
Other methods that can be used
Color coding is specified by hue in the function
In the axes function
df = sns.load_dataset("iris")
sns.scatterplot(data=df,x='sepal_length',y='sepal_width',hue='species')
The legend automatically goes inside, If you use the function on the figure side, it will automatically go out
df = sns.load_dataset("iris")
sns.relplot(data=df,x='sepal_length',y='sepal_width',hue='species',kind='scatter')
I want to draw a scatter plot, but sometimes I want to draw a scatter plot for each group Multiple drawing for data can be realized with FacetGrid Which axis to group by is specified by col (col is column instead of color) The number of drawing areas is automatically determined from the value of the qualitative variable of the axis to be grouped.
df = sns.load_dataset("penguins")
sns.FacetGrid(df,col='species')
Map to the created drawing area
df = sns.load_dataset("penguins")
g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
If you want a legend, add it later with add_legend
df = sns.load_dataset("penguins")
g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()
If you want to add another drawing axis, you can divide it further by specifying it in the row of FacetGrid.
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time", row="sex")
g.map(sns.scatterplot, "total_bill", "tip")
When giving a title to the whole instead of each grid, display it with suptitle Since matplotlib is running behind, you can also use suptitle
df = sns.load_dataset("penguins")
g=sns.FacetGrid(df,col='species')
g.map_dataframe(sns.scatterplot,x='flipper_length_mm',y='bill_depth_mm',hue="sex")
g.set_axis_labels('flipper_length_mm','bill_depth_mm')
g.add_legend()
g.fig.suptitle('suptitle',y=1.1,x=0,size=18)
As it is, the range of values is too wide to understand
planets = sns.load_dataset("planets")
sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
Pre-processing the value by specifying the scale
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")
The background grid disappears in the theme, but the X and Y axis bars do not disappear
sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")
Erase with despine
sns.set_theme(style="white")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",hue="year",palette='nipy_spectral')
g.set(xscale="log", yscale="log")
g.despine(left=True, bottom=True)
sns.set_theme(style="dark",palette='Accent')
df = sns.load_dataset("penguins")
g=sns.displot(df.flipper_length_mm)
g.set(xlim=(0, 300), ylim=(0, 100))
Or set on the FacetGrid side
sns.FacetGrid(df,col='species',xlim=[0,10],ylim=[0,10])
sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r')
g.set(xscale="log", yscale="log")
When executed by default, it becomes scatter
sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='scatter')
g.set(xscale="log", yscale="log")
Same even if scatter is specified by kind
If you specify line with kind, it will connect the points and make it linear.
sns.set_theme(style="whitegrid")
planets = sns.load_dataset("planets")
g=sns.relplot(data=planets,x="distance", y="orbital_period",palette='Dark2_r',kind='line')
g.set(xscale="log", yscale="log")
I want to change the shape of the scatter plot according to the value like a bubble plot In such a case, specify the data string containing the value in the size argument. Pass the upper and lower limits to sizes by list or tuple
planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300))
g.set(xscale="log", yscale="log")
Operate the size of the output figure from height
planets = sns.load_dataset("planets")
cmap = sns.cubehelix_palette(rot=-.2, as_cmap=True)
g = sns.relplot(data=planets,x="distance", y="orbital_period",hue="year", size="mass",palette='nipy_spectral', sizes=(10, 300),height=10)
g.set(xscale="log", yscale="log")
Some data that is difficult to understand with scatter plots
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='scatter')
Easy to understand with line
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line')
The confidence interval is specified by ci. You can pass sd or a real number, and if it is a real number, it represents a 〇% confidence interval. sd uses the sd calculated from the observation site as it is
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="event",kind='line',ci=20)
Specified as a 20% confidence interval
You can also add markers for each event point and make confidence intervals sticks instead of areas.
sns.relplot(data=fmri, x="timepoint", y="signal", hue="event", err_style="bars", ci=95,markers=True,kind='line')
Like the size at the time of scatter introduced earlier, the analysis axis can be divided by style.
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", style="event",kind='line')
Of course you can also specify size
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", size="event", kind='line')
If you want to divide the drawing screen itself, specify the axis you want to divide into col.
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line')
When drawing with multiple axes, facet_kws can specify whether to share the x-axis and y-axis scales in each figure.
fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri,x="timepoint", y="signal", hue="region", col="event", kind='line',facet_kws=dict(sharey=False,sharex=False))
Be careful not to erase the label and get a surprise graph
If you really want to divide it in another way, you can use FacetGrid by specifying the function on the axes side instead of the function on the figure side.
fmri = sns.load_dataset("fmri")
g=sns.FacetGrid(fmri,col='event')
g.map_dataframe(sns.lineplot,data=fmri,x='timepoint',y='signal',hue="region")
The default is kind ='hist'
penguin = sns.load_dataset("penguins")
sns.displot(data=penguin,x='bill_depth_mm')
#sns.displot(data=penguin,x='bill_depth_mm',kind='hist')
You can change the expression method with the argument element The default is bar
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly')
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='step')
poly
step
You can also change the fineness of dividing by bins
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',element='poly',bins=100)
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',kde=True)
Of course, if you specify kind ='kde', hist will disappear.
sns.displot(data=penguin,x='bill_depth_mm',kind='kde')
sns.histplot(data=penguin,x='bill_depth_mm')
sns.kdeplot(data=penguin,x='bill_depth_mm')
Bw_adjust to decide how much data width to look for when smoothing
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',bw_adjust=.2)
When 0.2
When 100
As usual Color coded by hue Screen division by col
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island')
Specify stack if you want to stack density functions Be careful not to misunderstand that there are more people on the side as you only get on the top
Can be used with hist or kde
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack")
Specify linewidth = 0 when you want to erase the boundary surface when stacking.
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',col='island',multiple="stack",linewidth=0)
In addition, if you specify edge color = "0.1", you can strengthen the border line.
Set fill to False when you want to remove the color inside in the stacked graph The graph has finally become a misunderstanding
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="stack",fill=False)
You can adjust the color sheer by adjusting alpha
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',fill=True,alpha=0.5)
By specifying fill instead of stack, it will be possible to plot which ratio is larger in the area where the whole is 1.
sns.displot(data=penguin,x='bill_depth_mm',kind='kde',hue='sex',multiple="fill")
See the changes with hist
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="fill")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="layer")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="dodge")
sns.displot(data=penguin,x='bill_depth_mm',kind='hist',hue='sex',fill=True,multiple="stack")
fill
layer
dodge
stack
Both hist and kde can be made into a two-dimensional plot by specifying x and y. As an image, the contour lines of the map
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='hist')
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm",kind='kde')
Color coding is hue Fill is fill Screen division is col There is no change in the rule
If you want to add a color gradation, specify the pattern in cmap
Fills unvalued (low probability) areas thresh = 0 (positive value less than 1) Adjust the fineness of the gradation stage with the specified levels
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde",fill=True, thresh=0, levels=10, cmap='cubehelix')
thresh=0, levels=10
thresh=0.8, levels=10
thresh=0, levels=100
A rug is like a fine beard on the side of the shaft. You can check how dense the rug is by looking at the density with kde
sns.displot(data=penguin, x="flipper_length_mm", y="bill_length_mm", kind="kde", rug=True)
Can be combined with a scatter plot
sns.scatterplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")
sns.rugplot(data=penguin, x="flipper_length_mm", y="bill_length_mm")
Speaking of histgram, it is an image of a stacked graph with values on the x-axis, By specifying a value for y, it automatically draws horizontally
sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')
I talked about the whole title at the time of option To change each small title, it is also possible to put graph information in g once and fetch it from col_name (See set_titles for details)
g=sns.displot(data=penguin,y='bill_depth_mm',kind='kde',hue='sex',col='island')
g.set_titles("{col_name} penguins")
Can be output as cumulative probability
sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf")
sns.displot(data=penguin, x="flipper_length_mm", kind="ecdf",complementary=True)
complementary=True
Can be used for survival time analysis, etc.
displot also has a function that allows you to specify whether to perform log processing internally. It is also possible to logarithmically process only one axis
Drawing is usually done with col and it is divided naturally, 〇 You can also specify that you want to display in columns
diamonds = sns.load_dataset("diamonds")
sns.displot(data=diamonds, x="depth", y="price", log_scale=(True, False), col="clarity",col_wrap=5,kind='kde')
Usually specified for strip
Since it is a visualization tool suitable for categories, it corresponds to qualitative variables.
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='strip')
Let's look at the types of kind
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='box')
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='boxen')
sns.catplot(data=penguin,x='species',height=6,kind='count')
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='bar')
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='violin')
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='swarm')
sns.catplot(data=penguin,x='species',y='bill_depth_mm',height=6,kind='point')
box
boxen
count
bar (see the y-axis of count)
violin
swarm
point Can be used for group comparison, analysis of variance, etc.
I think you could see the above and understand that there is a confidence interval. Can be set with ci
Dodge whether to draw in the same column or separate
sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=False,height=6)
sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g", hue="sex", dodge=True,height=6)
Same for swarm and violin
If you set dodge to False in bar, it will be a confusing figure
I'm not really riding, I'm hiding behind
split can be selected for violin
sns.catplot(data=penguin, kind="violin",x="species", y="body_mass_g", hue="sex", split=True)
You can swap y and x to turn it sideways
sns.catplot(data=penguin, kind="violin",y="species", x="body_mass_g", hue="sex", split=True)
swarm tends to have many points When it cannot be drawn, it warns that the point that should be drawn is not created. Easy
There is also a function that overlays it by adding it after the box or violin.
sns.catplot(data=penguin, kind="box",x="species", y="body_mass_g",height=6)
sns.swarmplot(data=penguin,x="species", y="body_mass_g",hue='sex',palette="Set1")
【adv-1】pair plot
Try using a convenient pair to get a bird's eye view of the entire data frame
If you call the pairplot function, you will often see the figure.
penguin = sns.load_dataset("penguins")
sns.pairplot(penguin)
sns.pairplot(penguin, hue="species)
With PairGrid, you can specify the upper and lower plot formats again.
penguin = sns.load_dataset("penguins")
g = sns.PairGrid(penguin, diag_sharey=False)
g.map_upper(sns.scatterplot, s=15)
g.map_lower(sns.kdeplot)
g.map_diag(sns.kdeplot, lw=2)
【adv-2】heat map
Returns a heatmap for matrix data
flights = sns.load_dataset("flights")
flights = flights.pivot("month","year", "passengers")
sns.heatmap(flights)
If you want to put the numbers together, set annot to True and use fmt so that the numbers do not go wild.
sns.heatmap(flights, annot=True, fmt="d")
When you want to write boundaries, specify with linewidths
sns.heatmap(flights, linewidths=.5)
Using triu_indices_from, which creates a numpy triangular matrix, By putting a matrix like the one created by T, F 1,0 in the mask, Can be shaped and output
corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
f, ax = plt.subplots(figsize=(7, 5))
ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True)
【adv-3】joint plot
Check the distribution of each variable while looking at the relationship between the two variables as the density.
sns.jointplot(x='bill_length_mm', y='bill_depth_mm', data=penguin)
This also has a type and can be specified by kind The default above is scatter
kde
hist
hex
reg(regression)
plot for resid model residual confirmation (see below)
You can also plot while thinking about combinations by using jointgrid.
Arbitrarily specify the main and side views
g=sns.JointGrid(x='bill_length_mm', y='bill_depth_mm', data=penguin)
g.plot(sns.regplot, sns.kdeplot)
Add rugs, add kde to scatter plots, etc. Add using plot_joint and marginals
g = sns.jointplot(data=penguin, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.kdeplot, color="r", zorder=0, levels=6)
g.plot_marginals(sns.rugplot, color="r", height=-.15, clip_on=False)
Create any combination by specifying main, top, right
g = sns.JointGrid()
x, y = penguin["bill_length_mm"], penguin["bill_depth_mm"]
sns.scatterplot(x=x, y=y, ec="b", fc="none", s=100, linewidth=1.5, ax=g.ax_joint)
sns.histplot(x=x, fill=False, linewidth=2, ax=g.ax_marg_x)
sns.kdeplot(y=y, linewidth=2, ax=g.ax_marg_y)
【adv-4】linear model plot
Apparently, the combination of Facet and regplot looks like lmplot, so basically I will explain it with lmplot. It seems to be regplot that works with pairplot and jointplot
Visualize linear regression
sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin)
Color coded by hue
Determine the degree of polynomial regression with order
sns.lmplot(x='bill_length_mm', y='body_mass_g', data=penguin, order=5)
There is also an option to do logistic
penguin['male'] = pd.get_dummies(penguin.sex)['Male']
sns.lmplot(x='bill_length_mm', y='male',data=penguin, logistic=True)
After making a model, I want to compare the residuals Check the residuals to see how much the drawn lm model (linear or polynomial) can explain the data. As a judgment of the linear model, if the residuals seem to follow a normal distribution, the model can be judged to be a good model to some extent.
Probably because of the relationship between reg and lm Is the background function running on "resid" of jointplot residplot?
sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=1,kind='resid')
sns.jointplot(x='bill_length_mm', y='body_mass_g', data=penguin,order=10,kind='resid')
First-order residuals
10th order residual
import sklearn
from sklearn import datasets,linear_model
penguin=penguin.dropna(how='any')
model = linear_model.LinearRegression()
X = np.array(penguin["bill_length_mm"]).reshape(-1, 1)
Y = np.array(penguin["body_mass_g"]).reshape(-1, 1)
model.fit(X, Y)
pred_y = model.predict(X)
plt.scatter(x=X, y=Y-pred_y)
It seems that he will calculate the residuals.
【adv-5】clustermap
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
Visualizes hierarchical clustering Specify by method which distance criteria to group
This area is difficult to explain, so it is faster if you learn hierarchical clustering. method is obtained from scipy calculation [scipy.cluster.hierarchy.linkage](https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy .linkage )
mathod
single
complete
average
weighted
centroid
median
ward
single
We plan to introduce more visualization methods Even if the code is dirty, I want to be careful about using the wrong figure as much as possible
Release news-v0.11.0 (September 2020)