Recommendation of Altair! Data visualization with Python

About this article

Introducing ** Altair **, Python's data visualization library. Since there is little information in Japanese, I will write an article for missionary purposes. If you are comfortable with English, it is easiest to see the Official Page.

Installation

You can easily install it with the pip command. vega_datasets will be used later, so let's install it together.

pip install altair vega_datasets

I used Google Colaboratory and it worked from the beginning without installation.

Advance preparation

Load the library and dataset below.

import altair as alt
from vega_datasets import data
iris = data.iris()

Altair is good at working with Pandas, and ʻiris` is a Pandas DataFrame.

Basic visualization

The following is the assumed code to be visualized directly with Jupyter etc. If you want to output in html, add .save ("filename.html ") to the end. You don't need to have .interactive (), but if you write it, you will be able to move the graph. ** This article is a normal image, so if you want to move it around, please use here **.

Example ① Scatter plot

image.png

Specify the x-axis and y-axis values as follows: You can also write ʻalt.X () `like a comment, and use this for complicated visualization.

alt.Chart(iris).mark_point().encode(
    x="sepalLength", # alt.X("sepalLength"),
    y="sepalWidth", # alt.Y("sepalWidth"),
    color="species"
).interactive()

Example ② Bar graph

image.png

The point is to take the average for each species with ʻaverage ()`. You can perform various operations other than average, and you can check the list at here.

alt.Chart(iris).mark_bar().encode(
    x="average(sepalLength)", # alt.X("sepalLength", aggregate="average"),
    y="species", # alt.Y("species"),
).interactive()

The point is like this. If you replace the make_xxxxx part with make_line, you can easily draw a line graph. If you have a problem, you can usually solve it by searching for a similar graph from Gallery on the official page.

TIPS (may be added)

Information display by mouse over

image.png

The information specified by the tooltip argument is displayed by mouse over.

alt.Chart(iris).mark_point().encode(
    x="sepalLength",
    y="sepalWidth",
    color="species",
    tooltip=["sepalLength", "sepalWidth", "petalLength", "petalWidth", "species"]
).interactive()

I don't want to write the axis from 0

image.png

Quantitative data is basically visualized including 0. The graph above clearly indicates that it does not contain 0 with zero = False.

alt.Chart(iris).mark_point().encode(
    alt.X("sepalLength", scale=alt.Scale(zero=False)),
    alt.Y("sepalWidth", scale=alt.Scale(zero=False)),
    color="species"
).interactive()

Data type specification

image.png

Nominal scales are often integers, aren't they? In that case, specify that it is a nominal scale, such as species_int: N. By the way, the ordinal scale is : O, and for quantitative data it is: Q. Details can be found in the official documentation here.

#Convert to integer value(setosa: 0, versicolor: 1, virginica: 2)
iris["species_int"] = [["setosa", "versicolor", "virginica"].index(x) for x in iris["species"]]

#Correct example
alt.Chart(iris).mark_point().encode(
    x="sepalLength",
    y="sepalWidth",
    color="species_int:N"
).interactive()

By the way, without : N, it will be as follows. image.png

Supports MaxRowError

I get angry when I exceed 5000 lines. Execute the following referring to the information in here.

alt.data_transformers.disable_max_rows()

Finally

It can be described simply and is convenient for exploratory analysis. If there is a drawback, html output is easy, but png output seems to be a little difficult. FYI!

Recommended Posts

Recommendation of Altair! Data visualization with Python
Easy data visualization with Python seaborn.
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
Data analysis with python 2
Python Data Visualization Libraries
Data visualization with pandas
Data analysis with Python
Logistics visualization with Python
Real-time visualization of thermography AMG8833 data in Python
Recommendation of binpacking library of python
Recommendation of building a portable Python environment with conda
Visualization of data by prefecture
Overview and tips of seaborn with statistical data visualization
Challenge principal component analysis of text data with Python
Python application: data visualization # 2: matplotlib
[Basics of data science] Collecting data from RSS with python
Extract the band information of raster data with python
Read json data with python
Try scraping the data of COVID-19 in Tokyo with Python
Notes on handling large amounts of data with python + pandas
Get rid of dirty data with Python and regular expressions
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
Implement normalization of Python training data preprocessing with scikit-learn [fit_transform]
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
[Python] What do you do with visualization of 4 or more variables?
[ns3-30] Enable visualization of Python scripts
Proper use of Python visualization packages
[Python] Get economic data with DataReader
Getting Started with Python Basics of Python
Solve AtCoder Problems Recommendation with python (20200517-0523)
Python data structures learned with chemoinformatics
Life game with Python! (Conway's Game of Life)
10 functions of "language with battery" python
Recommendation of data analysis using MessagePack
Implementation of Dijkstra's algorithm with python
Python application: data visualization part 1: basic
Process Pubmed .xml data with python
Coexistence of Python2 and 3 with CircleCI (1.0)
Implement "Data Visualization Design # 2" with matplotlib
Python application: Data cleansing # 2: Data cleansing with DataFrame
Basic study of OpenCV with Python
Impressions of touching Dash, a data visualization tool made by python
[Python] Visualization of longitudinal data (plot, boxplot, violin plot, confidence interval, histogram)
Get additional data in LDAP with python
Data pipeline construction with Python and Luigi
[Examples of improving Python] Learning Python with Codecademy
Receive textual data from mysql with python
[Note] Get data from PostgreSQL with Python
Python Application: Data Visualization Part 3: Various Graphs
Python visualization tool for data analysis work
Process Pubmed .xml data with python [Part 2]
Add a Python data source with Redash
Execute Python script with cron of TS-220
Retrieving food data with Amazon API (Python)
Try working with binary data in Python
Generate Japanese test data with Python faker
Check the existence of the file with python
Algorithm learned with Python 8th: Evaluation of algorithm
Convert Excel data to JSON with python