[Python] How to use the graph creation library Altair

Overview

In this article, I will try to draw various graphs using the Python graph creation library Altair. Altair is characterized by inputting data with Pandas DataFrame.

test data

In this paper, we used the Titanic Passenger Database published on Kaggle. The data format is as follows.

train.csv


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q

The way to read the data is shown below. Reference

Install Altair

Can be installed with pip.

Terminal


pip install altair

environment

Scatter plot

Altair is the best library for creating scatter plots. Even numerical data is processed as category data by adding : O.

altair_demo.py


import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

scatter_plot = alt.Chart(df).mark_circle().encode(
    x=alt.X('Age'),
    y=alt.Y('Fare'),
    column=alt.Column('Survived:O'),
    color=alt.Color('Sex', sort=['male', 'female']),
    tooltip=['Age', 'Fare', 'Name'],
    size=alt.Size('Pclass:O')
).properties(
	width=600,
	height=500
).interactive()

scatter_plot.show()

スクリーンショット 2020-10-18 23.02.34.png

Linear regression line

When drawing a line segment, you can connect it by creating the coordinates of the start point and end point in DataFrame. The intercept and slope of the linear regression are determined by sckit-learn.

altair_demo.py


import os
import altair as alt
import pandas as pd
from sklearn.linear_model import LinearRegression

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

#Delete the row containing the missing value

linear_df = df.dropna(subset=['Age', 'Fare'], how='any', axis=0)

#Create a linear regression model

linear = LinearRegression(
    ).fit(linear_df['Age'].values.reshape(-1,1), linear_df['Fare'].values.reshape(-1,1))

#Parameter determination

a = linear.coef_[0]
b = linear.intercept_

#Threshold determination

x_min = df['Age'].min()
x_max = df['Age'].max()

#Creating a data frame

linear_points = pd.DataFrame({
    'Age': [x_min, x_max],
    'Fare': [a*x_min+b, a*x_max+b],
}).astype(float)

linear_line = alt.Chart(linear_points).mark_line(color='steelblue').encode(
    x=alt.X('Age'),
    y=alt.Y('Fare')
    ).properties(
    width=500,
    height=500
    ).interactive()

linear_line.show()

スクリーンショット 2020-10-18 23.07.39.png

Superposition of figures

It is also possible to display it on top of the scatter plot.

altair_demo.py


import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

scatter_plot = alt.Chart(df).mark_circle(size=50).encode(
    x=alt.X('Age'),
    y=alt.Y('Fare'),
).properties(
    width=500,
    height=500
).interactive()

linear_line =Same as above (omitted)

(scatter_plot + linear_line).show()

スクリーンショット 2020-10-19 10.24.58.png

Box plot

altair_demo.py


import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

boxplot = alt.Chart(df.dropna(subset=['Embarked'], how='any', axis=0)).mark_boxplot().encode(
    x=alt.X('Survived:O'),
    y=alt.Y('Fare'),
    column=alt.Column('Embarked', sort=['S','Q','C']),
    color=alt.Color('Sex', sort=['male', 'female'])
).properties(
	width=600,
	height=500
).interactive()

boxplot.show()

スクリーンショット 2020-10-18 23.09.14.png

histogram

By setting the Y-axis to count (), it will count the elements. You can set bin with ʻalt.X () `.

altair_demo.py


import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

histgram = alt.Chart(df).mark_bar().encode(
    x=alt.X("Age", bin=alt.Bin(step=10,extent=[0,90])),
    y=alt.Y('count()'),
    column=alt.Column('Survived:O'),
    color=alt.Color('Sex', sort=['male', 'female']),
    opacity=alt.Opacity('Sex', sort=['male', 'female'])
    ).properties(
	width=600,
	height=500
	).interactive()

histgram.show()

スクリーンショット 2020-10-18 23.15.31.png

How to save the figure

You can save the created figure as html by installing the following package.

Terminal


pip install altair_saver

Add .interactive () to make the graph move freely. This property is preserved in the saved html.

altair_demo.py


import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

boxplot = alt.Chart(df.dropna(subset=['Embarked'], how='any', axis=0)).mark_boxplot().encode(
    x=alt.X('Survived:O'),
    y=alt.Y('Fare')).interactive()

boxplot.save(fp=boxplot.html)

If you want to save in a format other than .html, you can refer to here.

Application example

By combining with Streamlit, you can create various data analysis applications.

Recommended Posts

[Python] How to use the graph creation library Altair
How to use the graph drawing library Bokeh
How to use Requests (Python Library)
[Python] How to import the library
[python] How to use the library Matplotlib for drawing graphs
How to use Python Image Library in python3 series
[Algorithm x Python] How to use the list
python3: How to use bottle (2)
How to use the generator
[Python] How to use list 1
How to use Python argparse
Python: How to use pydub
[Python] How to use checkio
[Python] How to use input ()
How to use the decorator
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
python3: How to use bottle
How to use Python bytes
How to use the Raspberry Pi relay module Python
I wanted to use the Python library from MATLAB
[Introduction to Udemy Python3 + Application] 27. How to use the dictionary
[Introduction to Udemy Python3 + Application] 30. How to use the set
How to use the model learned in Lobe in Python
How to use the Rubik's Cube solver library "kociemba"
Python: How to use async with
How to use the zip function
How to use the optparse module
[Python] How to use Pandas Series
How to use SQLite in Python
How to get the Python version
[Python] How to use list 3 Added
How to use Mysql in python
How to use OpenPose's Python API
How to use ChemSpider in Python
How to use FTP with Python
Python: How to use pydub (playback)
How to use PubChem in Python
How to use python zip function
How to use the ConfigParser module
[Python] How to use Typetalk API
How to use the __call__ method in a Python class
[Hyperledger Iroha] Notes on how to use the Python SDK
I didn't know how to use the [python] for statement
Notes on how to use marshmallow in the schema library
[Python] Summary of how to use pandas
[Introduction to Python] How to use class in Python?
How to use the Spark ML pipeline
How to install and use pandas_datareader [Python]
[python] How to use __command__, function explanation
[Linux] How to use the echo command
How to use the Linux grep command
[Python] How to use import sys sys.argv
How to use the asterisk (*) in Python. Maybe this is all? ..
[Introduction to Python] How to use the in operator in a for statement?
[Python] Organizing how to use for statements
Memorandum on how to use gremlin python
[Python2.7] Summary of how to use unittest
[Python] Explains how to use the format function with an example
python: How to use locals () and globals ()