[PYTHON] Data visualization with pandas

There is already a preceding article, but pandas has a data visualization function. It's a thin wrapper for matplotlib, but it breaks the basic graph code quite a bit.

Data visualization with Pandas

With the visualization of iris introduced here, it can be visualized with the same amount of code as R.

Python for R Users [Differences between Python and R (data visualization / graph creation)](http://pythondatascience.plavox.info/python%E3%81%A8r%E3%81%AE%E9%81%95%E3%81% 84 /% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 8F% AF% E8% A6% 96% E5% 8C% 96% E3% 83% BB% E3% 82% B0 % E3% 83% A9% E3% 83% 95% E4% BD% 9C% E6% 88% 90 /)

Most of the imports are python specifications, but I feel that the productivity of data analysis with python is quite high now. R studio / dplyr has become popular and productivity has improved, but I think that python has changed from 5 years ago with Jupyter / Pandas becoming popular. (It's completely different from when numpy + matplotlib was the main.)

Required packages, loading data

import seaborn as sns
import pandas as pd
iris=sns.load_dataset("iris")
%%matplotlib inline

Import seaborn to make it look fashionable. In addition, seaborn also contains toy data, so iris can be loaded from here.

Scatter plot

iris.plot.scatter(x="sepal_length",y="sepal_width")

scatter.png

Box plot

iris.sepal_length.plot.box()

box.png

histogram

iris.sepal_length.hist()

hist.png

Line chart

iris.sepal_length.plot.line()

line.png

Pie chart

pd.crosstab(iris.species,columns="species").plot.pie(y="species")

pie.png

This has some challenges by default. --The default value is horizontally long, so it collapses. --Labels overlap --The color map is not made because seaborn does not have a pie chart.

It is subtle to change the settings for this purpose, but if you add settings, it will be fine.

from matplotlib import pylab
default_size=pylab.rcParams["figure.figsize"]
pylab.rcParams["figure.figsize"]=12,12
pd.crosstab(iris.species,columns="species").plot.pie(y="species",colors=sns.color_palette())

pie2.png

If you change the size, put it back.

pylab.rcParams["figure.figsize"]=default_size

Bar chart

iris.sepal_length.plot.bar()

bar.png

Perhaps because the bar chart assumes categorical variables, it didn't thin out the axis labels by default.

If you call matplotlib directly, it will be thinned out. (Since seaborn has been imported, the color is seaborn.)

from matplotlib import pyplot as plt
plt.bar(iris.index,iris.sepal_length)

bar2.png

Summary

The original slide was explained by Pandas in the first half, but the visualization in the second half does not use Pandas, so the code is redundant. If you want to do complicated things, you have to touch the API of matplotlib directly, Basic diagrams can be coded simply with the Pandas API.

Recommended Posts

Data visualization with pandas
Data manipulation with Pandas!
Shuffle data with pandas
Implement "Data Visualization Design # 3" with pandas and matplotlib
Data processing tips with Pandas
Data visualization method using matplotlib (+ pandas) (5)
Versatile data plotting with pandas + matplotlib
Data visualization method using matplotlib (+ pandas) (3)
Easy data visualization with Python seaborn.
Data analysis starting with python (data visualization 1)
Data visualization method using matplotlib (+ pandas) (4)
Data analysis starting with python (data visualization 2)
Implement "Data Visualization Design # 2" with matplotlib
Read pandas data
Try converting to tidy data with pandas
Recommendation of Altair! Data visualization with Python
Working with 3D data structures in pandas
Example of efficient data processing with PANDAS
Best practices for messing with data with pandas
Data analysis with python 2
Quickly visualize with Pandas
Try to aggregate doujin music data with pandas
Processing datasets with pandas (1)
Bootstrap sampling with Pandas
Convert 202003 to 2020-03 with pandas
Processing datasets with pandas (2)
Python Data Visualization Libraries
Merge datasets with pandas
Visualize data with Streamlit
Reading data with TensorFlow
Data Augmentation with openCV
Make holiday data into a data frame with pandas
Normarize data with Scipy
Data analysis with Python
Logistics visualization with Python
LOAD DATA with PyMysql
Analysis of financial data by pandas and its visualization (2)
How to convert horizontally held data to vertically held data with pandas
Be careful when reading data with pandas (specify dtype)
Data analysis environment construction with Python (IPython notebook + Pandas)
Overview and tips of seaborn with statistical data visualization
How to extract non-missing value nan data with pandas
Process csv data with python (count processing using pandas)
How to extract non-missing value nan data with pandas
Sample data created with python
Embed audio data with Jupyter
Graph Excel data with matplotlib (1)
Load nested json with pandas
Artificial data generation with numpy
Extract Twitter data with CSV
Get Youtube data with python
Clustering ID-POS data with LDA
Learn new data with PaintsChainer
Visualization of data by prefecture
Binarize photo data with OpenCV
[Python] Change dtype with pandas
Graph Excel data with matplotlib (2)
Python application: data visualization # 2: matplotlib
Save tweet data with Django
Standardize by group with pandas
Data visualization method using matplotlib (2)