A library called Plotly Express (official site) that allows you to easily perform interactive operations based on javascript was released a while ago. I found. I felt the potential of a dimension clearly different from the static drawing so far, so I summarized how to draw a basic visualization method as a memorandum. For reference, I am comparing it with the pattern that I wrote a similar figure with matplotlib + seaborn.
The "basic visualization method" is based on the previously written "How to select and draw the correct visualization method in exploratory data analysis".
Plotly Express is a set of high-level APIs for plotly released in March 2019. The feature is that you can easily write interactive and complicated drawings.
Official site: https://plot.ly/python/plotly-express/ API reference: https://www.plotly.express/plotly_express/#plotly_express.line Github (part of plotly): https://github.com/plotly/plotly.py
Seaborn is a Wrapper for writing Matplotlib easily, but let's compare Plotly Express and Seaborn to see which is better. Roughly
It is a contrasting structure. (Reference) Summary of drawing libraries in 2018 stage organized by Anaconda
plotly is an open source and interactive data visualization tool. Besides Python, it can be used with R and javascript. From January 11, 2019, the version of plotly.py has been increased to 4. It is an MIT license. Github: https://github.com/plotly/plotly.py
Until plotly3, it seems that the created graph had a more commercial smell, such as being processed or published "online" if you were not careful. "Offline mode" is now the default in plotly4, expanding what you can do within the free frame and making it easier to use. Announcement of plotly 4.0.0: https://community.plot.ly/t/introducing-plotly-py-4-0-0/25639
** Free ** plotly: A visualization module based on javascript for interactive operation
** Paid (partial) ** Chart Studio: A service created by disconnecting plotly's online mode? Visualization system DASH: You can create dashboard apps in Python or R without the need for knowledge of javascript.
Assuming that mac, jupyter lab is included Some settings are required to support jupyter lab
pip install plotly
# Avoid "JavaScript heap out of memory" errors during extension installation
# (OS X/Linux)
export NODE_OPTIONS=--max-old-space-size=4096
# Jupyter widgets extension
jupyter labextension install @jupyter-widgets/[email protected] --no-build
# FigureWidget support
jupyter labextension install [email protected] --no-build
# and jupyterlab renderer support
jupyter labextension install [email protected] --no-build
# JupyterLab chart editor support (optional)
jupyter labextension install [email protected] --no-build
# Build extensions (must be done to activate extensions since --no-build is used above)
jupyter lab build
# Unset NODE_OPTIONS environment variable
# (OS X/Linux)
unset NODE_OPTIONS
For details, refer to the install page on the official website.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.io as pio
import japanize_matplotlib
import datetime
from matplotlib.dates import MonthLocator
from IPython.display import HTML
sns.set_style('darkgrid')
pio.templates.default = 'seaborn'
plt.rcParams['font.family'] = 'IPAexGothic'
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}
import matplotlib
import plotly
print(matplotlib.__version__) # 3.1.1
print(sns.__version__) # 0.9.0
print(plotly.__version__) # 4.2.1
#Use iris dataset
iris = sns.load_dataset('iris')
iris.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
matplotlib+seaborn
fig, axes = plt.subplots(ncols=2, figsize=(16, 4))
sns.distplot(iris['sepal_length'], bins=np.arange(4,8,0.25), kde=False, label='all', ax=axes[0])
sns.distplot(iris.query('species=="setosa"')['sepal_length'], kde=False,
bins=np.arange(4,8,0.25), label='setosa', hist_kws={'alpha':0.3}, ax=axes[1])
sns.distplot(iris.query('species=="versicolor"')['sepal_length'], kde=False,
bins=np.arange(4,8,0.25), label='versicolor', hist_kws={'alpha':0.3}, ax=axes[1])
sns.distplot(iris.query('species=="virginica"')['sepal_length'], kde=False,
bins=np.arange(4,8,0.25), label='virginica', hist_kws={'alpha':0.3}, ax=axes[1])
axes[0].set_title("Histogram of length(cm)")
axes[1].set_title("Histogram of the length of the sword for each variety(cm)")
axes[1].legend()
plt.show()
Plotly Express
#You can't explicitly specify the width of the bin, but changing the size of the nbins will give you a nice width.
#plotly express generates fig
#If you want to store the diagram in subplots, each fig.You can retrieve and store trace from data
#However, if marginal is specified, the position of marginal will collapse, so leave it for a while.
fig = px.histogram(iris, x='sepal_length', color='species',
nbins=19, range_x=[4,8], width=600, height=350,
opacity=0.4, marginal='box')
#range when drawing histogram_If y is specified, the drawing position of the marginal box plot will be corrupted.
fig.update_layout(barmode='overlay')
fig.update_yaxes(range=[0,20],row=1, col=1)
#Save as html
# fig.write_html('histogram_with_boxplot.html', auto_open=False)
fig.show()
Click here for a working link: https://uchidehiroki.github.io/plotly_folders/Basic_Charts/histogram_with_boxplot_px.html
When you hover your mouse over it, various information will pop up. Marginal plot is easy to apply. You can see only that by double-clicking the variety you want to see from the item on the right.
plotly I will draw using the function of plotly body
fig = make_subplots(rows=1, cols=2, subplot_titles=('Sepal length(cm)', '品種毎のSepal length(cm)'))
fig.add_trace(go.Histogram(x=iris['sepal_length'], xbins=dict(start=4,end=8,size=0.25), hovertemplate="%{x}cm: %{y}Pieces", name="All varieties"), row=1, col=1)
for species in ['setosa', 'versicolor', 'virginica']:
fig.add_trace(go.Histogram(x=iris.query(f'species=="{species}"')['sepal_length'],
xbins=dict(start=4,end=8,size=0.25), hovertemplate="%{x}cm: %{y}Pieces", name=species), row=1, col=2)
fig.update_layout(barmode='overlay', height=400, width=900)
fig.update_traces(opacity=0.3, row=1, col=2)
fig.update_xaxes(tickvals=np.arange(4,8,0.5), title_text='sepal_length')
fig.update_yaxes(title_text='frequency')
fig.write_html('../output/histogram_with_boxplot_plotly.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/histogram_with_boxplot_plotly.html
matplotlib+seaborn
fig, ax = plt.subplots(figsize=(8,3))
sns.boxplot(x='sepal_length', y='species', data=iris, order=('setosa', 'versicolor', 'virginica'), ax=ax)
ax.set_title('Box plot of sepal length for each variety(cm)')
plt.show()
Plotly Express
fig = px.box(iris, y='species', x='sepal_length', color='species', orientation='h',
category_orders={'species': ['setosa', 'versicolor', 'virginica']},
title='Box plot of sepal length for each variety(cm)', width=600, height=400)
fig.update_layout(showlegend=False)
fig.write_html('../output/boxplot_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/boxplot_px.html For example, hovering the cursor over an outlier point will pop up information. It's not beautiful that each variety doesn't get on the horizontal line ...
plotly
fig = go.Figure()
for species in iris['species'].unique():
fig.add_trace(go.Box(x=iris.query(f'species=="{species}"')['sepal_length'], name=species))
fig.update_layout(height=300, width=600, showlegend=False, title_text='Box plot of sepal length for each variety(cm)')
fig.update_xaxes(title_text='sepal_length')
fig.write_html('../output/boxplot_plotly.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/boxplot_plotly.html This is a boxplot on the line
matplotlib + seaborn
fig, ax = plt.subplots(figsize=(8,3))
sns.barplot(x='sepal_length', y='species', order=['setosa', 'versicolor', 'virginica'], ci='sd', data=iris, ax=ax)
ax.set_title('Average length of sepals for each variety(cm)')
plt.show()
Plotly Express
#Unlike Seaborn, Plotly doesn't calculate the average length or standard deviation.
agg_iris = iris.groupby(['species'])[['sepal_length']].agg([np.mean, np.std, 'count']).reset_index()
agg_iris.columns = ['species', 'sepal_length', 'std', 'count']
fig = px.bar(agg_iris, x='sepal_length', y='species', color='species', category_orders={'species': ['setosa', 'versicolor', 'virginica']},
error_x='std', orientation='h', hover_data=['count'], height=300, width=600, title='Average length of sepals for each variety(cm)')
fig.update_layout(showlegend=False)
fig.write_html('../output/barplot_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/barplot_px.html
Useful when there are multiple comparison axes
tips = px.data.tips()
fig = px.histogram(tips, x="sex", y="tip", histfunc="avg", color="smoker", barmode="group",
facet_row="time", facet_col="day", category_orders={"day": ["Thur", "Fri", "Sat", "Sun"],
"time": ["Lunch", "Dinner"]},
height=400, width=800)
fig.write_html('../output/boxplot_with_facet_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/boxplot_with_facet_px.html
Preprocessing
iris['sepal_length_rank'] = pd.cut(iris['sepal_length'], [4,5,7,8], labels=['short', 'medium', 'long'])
agg_iris = iris.groupby(['sepal_length_rank','species']).size().reset_index()
agg_iris.columns = ['sepal_length_rank', 'species', 'count']
pivot_iris = agg_iris.pivot(index='sepal_length_rank', columns='species', values='count')
matplotlib+~~seaborn~~Pandas Plot
#stacked to seaborn=Since there is no function that corresponds to True, write it with Pandas plot
fig, ax = plt.subplots(figsize=(8,3))
pivot_iris.plot.barh(y=pivot_iris.columns, stacked=True, ax=ax)
ax.set_title('Frequency and variety breakdown of sepal length by rank')
ax.set_xlabel('frequency')
ax.set_ylabel('Sepal length rank')
plt.show()
Plotly Express
fig = px.bar(agg_iris, y='sepal_length_rank', x='count', color='species', orientation='h', barmode='relative',
height=300, width=600, title='Frequency and variety breakdown of sepal length by rank')
fig.update_xaxes(title_text='frequency')
fig.update_yaxes(title_text='Sepal length rank')
fig.write_html('../output/stacked_barplot_px.html', auto_open=False)
fig
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/stacked_barplot_px.html You can select any element and stack it
matplotlib + seaborn
#When you want to see the correlation between two variables
fig, ax = plt.subplots(figsize=(6,4))
sns.scatterplot(x='sepal_length', y='petal_length', hue='species', data=iris, ax=ax)
ax.legend(loc='lower right')
ax.set_title('Scatter plot of sepal length and petal length(cm)')
ax.set_xlabel('Sepal length(cm)')
ax.set_ylabel('Petal length(cm)')
plt.show()
#When you want to see the correlation of multiple variables
g = sns.pairplot(iris, hue='species')
plt.subplots_adjust(top=0.95)
g.fig.suptitle('Scatter plot matrix for each variety')
g.fig.set_size_inches(8,6)
Plotly Express
fig = px.scatter(px.data.iris(), x='sepal_length', y='petal_length', color='species', symbol='species',
marginal_x='box', marginal_y='histogram', trendline='ols',
hover_data=['species_id'], width=800, height=600, title='Scatter plot of sepal length and petal length(cm)')
#If range is not specified individually, not in scatter, it will be reflected in the marginal figure and the marginal figure will not be visible.
fig.update_xaxes(title_text='Sepal length(cm)', range=[4,8], row=1, col=1)
fig.update_yaxes(title_text='Petal length(cm)', range=[0.5,8], row=1, col=1)
fig.write_html('../output/scatterplot_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/scatterplot_px.html
fig = px.scatter_matrix(iris, dimensions=['sepal_length','sepal_width','petal_length','petal_width'],
color='species', size_max=1, title='Scatter plot matrix for each variety', width=800,height=600)
fig.write_html('../output/scattermatrix_px.html', auto_open=False)
fig
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/scattermatrix_px.html If you use the lasso tool in the upper right to enclose some data, only the data corresponding to that data will be highlighted in all figures. If you turn on "Toggle Spike Lines", it will give you auxiliary lines for x and y coordinates.
If there are too many variables, it is better to use parallel coordinates rather than looking at the scatterplot matrix
#Axis rearrangement can be operated interactively, this is dangerous
#Filtering is also possible for each axis, and the width shown can be adjusted by clicking the axis value.
#Rather than looking at each correlation with a scatterplot matrix, it seems better to look at these equilibrium coordinates because multiple axes can be considered.
fig = px.parallel_coordinates(px.data.iris(),
color='species_id',
dimensions=['sepal_length','sepal_width','petal_length','petal_width', 'species_id'],
color_continuous_scale=px.colors.diverging.Portland,
height=400, width=800)
fig.write_html('../output/parallel_coordinates_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/parallel_coordinates_px.html You can sort the axes If you select a value in the axis, it will be filtered only to the data that passes through that value.
#If the categorical variable is on the axis, you can do it like this
fig = px.parallel_categories(iris, dimensions=['species', 'sepal_length_rank'], color='sepal_length',
labels={'species': 'Ayame variety', 'sepal_length_rank': 'Sepal length grade'},
height=400, width=600)
fig.write_html('../output/parallel_categories_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/parallel_categories_px.html
Preprocessing
flights = sns.load_dataset("flights")
#Cross tabulation
crossed_flights = pd.pivot_table(flights, values="passengers", index="month", columns="year", aggfunc=np.mean)
# pd.crosstab(flights["month"], flights["year"], values=flights["passengers"], aggfunc=np.mean)But yes
matplotlib+seaborn
#Heat map
fig, ax = plt.subplots(figsize=(8,6))
sns.heatmap(crossed_flights,annot=True, cmap="Oranges", fmt='.5g', ax=ax)
ax.set_title('Passenger heat map');
Somehow the numbers go out of the frame ... It didn't happen in the past, but it's a mystery.
Plotly Express Since there was no corresponding function in Plotly Express, I will use the function of plotly itself plotly
fig = go.Figure()
fig.add_trace(go.Heatmap(z=crossed_flights, x=crossed_flights.columns, y=crossed_flights.index,
hovertemplate='%{x}-%{y}: %{z} passengers', colorscale='Oranges'))
fig.update_layout(height=400, width=600, title_text='Passenger heat map(With hover and without annotate)')
fig.write_html('../output/heatmap_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/heatmap_px.html
#The heatmap is easier to see if hover is turned off
fig = ff.create_annotated_heatmap(z=crossed_flights.values, x=list(crossed_flights.columns),
y=list(crossed_flights.index), colorscale='Oranges',
hoverinfo='none')
fig.update_layout(height=400, width=600, showlegend=False, title_text='Passenger heat map(Without hover With annotate)')
fig.write_html('../output/heatmap_with_annotate_px.html', auto_open=False)
fig
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/heatmap_with_annotate_px.html
Preprocessing
#Prepare a datetime type column
datetimes = []
for i, j in enumerate(flights.itertuples()):
datetimes.append(datetime.datetime.strptime(f'{j[1]}-{j[2]}', '%Y-%B'))
flights['datetime'] = datetimes
matplotlib+seaborn
fig, ax = plt.subplots(figsize=(12, 4))
sns.lineplot(x='datetime', y='passengers', data=flights, ax=ax)
ax.xaxis.set_major_locator(MonthLocator(interval=6))
ax.tick_params(labelrotation=45)
ax.set_xlim(datetime.datetime(1949,1,1,0,0,0),datetime.datetime(1961,1,1,0,0,0))
ax.set_title('Line graph of changes in the number of passengers(Monthly)')
plt.show()
Plotly Express
fig = px.line(flights, x='datetime', y='passengers',
height=400, width=800, title='Line graph of changes in the number of passengers(Monthly)')
fig.update_layout(xaxis_range=['1949-01-01', '1961-01-01'], # datetime.May be specified by datetime
xaxis_rangeslider_visible=True)
fig.update_xaxes(tickformat='%Y-%m', tickangle=45)
fig.write_html('../output/lineplot_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/lineplot_px.html You can check the line graph in the range specified by the slider below. When comparing multiple line charts, you can later select only the items you want to compare
I was looking at https://plot.ly/python/plotly-express/ and arranged some interesting figures.
Polar Coordinates In the bar graph, you could only compare the length of univariate, but it seems that you can use it when you want to compare scores on multiple axes such as test results.
melted_iris = iris.melt(id_vars=['species'], value_vars=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
var_name='variable', value_name='value')
melted_iris = melted_iris.groupby(['species', 'variable']).mean().reset_index()
fig = px.line_polar(melted_iris, r='value', theta='variable', color='species', line_close=True,
height=500, width=500, title='Length and width of sepals and petals for each variety of Ayame')
fig.write_html('../output/line_polar_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/line_polar_px.html Especially the shape of the setosa flower seems to be different
fig = make_subplots(rows=1, cols=2, subplot_titles=('Sepal length(cm)', '品種毎のSepal length(cm)'))
fig.add_trace(px.density_contour(iris, x="sepal_width", y="sepal_length").data[0],row=1, col=1)
fig2 = px.density_contour(iris, x="sepal_width", y="sepal_length", color='species')
[fig.add_trace(fig2.data[i], row=1, col=2) for i in range(len(fig2.data))] #Store all fig2 traces in fig
fig.write_html('../output/densityplot_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/densityplot_px.html Cool You can see with the mouse how high each contour line represents
gapminder = px.data.gapminder()
fig = px.scatter(gapminder, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
size="pop", color="continent", hover_name="country", facet_col="continent",
log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.write_html('../output/scatter_animation_px.html', auto_open=False)
fig.show()
https://uchidehiroki.github.io/plotly_folders/Basic_Charts/scatter_animation_px.html I think this is interesting when you click the URL and actually move it
I will summarize the results of comparing Plotly Express and seaborn (matplotlib) with my own judgment and prejudice. ◎: Recommended for use ○: No particular problem △: Dissatisfied ×: No implementation
Graph | Plotly Express | matplotlib seaborn |
Reason |
---|---|---|---|
histogram | ◎ | ○ | Seaborn is enough, but it is an advantage that you can select and see only the distribution of the varieties you want to see The point is that you don't have to repeat the same code for each product type. |
Box plot | ○ | ○ | There is not enough information to select and view a specific variety or add it by hovering With Plotly Express, the figure collapses a little If you are interested, you can use Plotly for the time being. |
bar graph | ○(※◎) | ◎ | seaborn will calculate the mean and standard deviation, but Plotly Express needs to be calculated in advance. (※2つ以上の軸でbar graphを作りたい場合はfacetが使えるPlotly Express一択) |
Stacked bar graph | ◎ | × | Plotly Express allows you to change the elements you stack later seabornはStacked bar graphの機能がない |
Scatter plot | ◎ | ○ | You can put a regression line for each group You can check meta information such as the slope of the regression line with the hover. Easily incorporate marginal plot Scatter plotの特定の箇所を拡大縮小出来る You can hover meta information such as identification numbers of individual data Scatter plot行列においてある図の特定のデータを指定すると他の図全てで同じデータのみを強調してくれる When data is grouped, a specific group can be selected and drawn. |
Parallel coordinates | ◎ | ? | Also supports categorical variables that can sort axes and filter by specific axes Parallel coordinatesは描画後に色々試行錯誤することで真の威力を発揮する |
Heat map | ○ | △ | If you don't need to hover additional information, seaborn will suffice It should be, but the figure of seaborn collapsed because the environment at hand is bad |
Line graph | ◎ | ○ | You can intuitively know the position of the coordinates even at a point away from the x-axis or y-axis with toggle. You can draw any range with the slider When there are multiple polygonal lines(For example, population trends in 47 prefectures, etc.), You can select and plot the prefectures you want to compare |
Plotly Express's overwhelming victory There were many other drawings such as radar charts, density distributions, and animations that would be interesting if used well. Besides, if you can master Slider, it will be very cool. https://plot.ly/python/sliders/ https://uchidehiroki.github.io/plotly_folders/Basic_Charts/histogram_with_slider.html
From kazutan.R
It is more troublesome than the image file, but it seems easier to refer to it after placing the html file locally or hosting it online. It's better to host online, especially when sharing with multiple people
Compared to png and jpeg image files, it takes a little trick to display html files. ** When displaying offline ** You can draw by specifying the path of the html file
** When displaying online ** plotly style: use chart_studio You can draw by hosting the html file on plotly's server and referencing the published URL However, free accounts do not allow you to issue secret links or host them in a private environment (rather, plotly makes money there). The public host of the html file also supports only up to 100 in total.
So use github pages.
Place the html file in the github repository, select master branch
in Settings
→ GitHub Pages
→ Source
, and press the Save
button.
あとは、username.github.io/path/to/html/files
にアクセスすればOKです
It's easy to manage html files because it's github.
Official site: https://pages.github.com/
Reference: https://www.tam-tam.co.jp/tipsnote/html_css/post11245.html
It seems that Github pages themselves can be published even in a private repository. If you analyze with data that you do not want to publish to the public, it seems good to save the drawing html file in a private repository and share only the URL.
#You can refer to the html file and embed it
HTML(filename='../output/histogram_with_boxplot_px.html')
#You can also refer to the static html you put online
HTML(url='https://uchidehiroki.github.io/plotly_folders/Basic_Charts/barplot_px.html')
I used to be a seaborn comedian, but I don't need to use it much anymore. I'm a little lonely, but I'm going to keep up with the times. I hope that the number of Plotly Express users will increase. It's been a long time, but thank you for your relationship.
Recommended Posts