[PYTHON] Data visualization method using matplotlib (+ pandas) (4)

We will continue to talk about data visualization with matplotlib and pandas until previous.

Visualize external data

Let's use external data as more practical data this time. First, download the data from pydata-book, which is also used as a reference for this article.

pydata-book/ch08/tips.csv https://github.com/pydata/pydata-book/blob/master/ch08/tips.csv

import numpy as np
from pandas import *
import matplotlib.pyplot as plt

tips = read_csv('tips.csv')

#Cross tabulate CSV data
party_counts = crosstab(tips.day, tips.size)
print( party_counts )
# =>
# size  1   2   3   4  5  6
# day                      
# Fri   1  16   1   1  0  0
# Sat   2  53  18  13  1  0
# Sun   0  39  15  18  3  1
# Thur  1  48   4   5  1  3

#Normalize the data
party_counts = party_counts.div(party_counts.sum(1), axis=0)
print( party_counts )
# =>
# [4 rows x 6 columns]
# size         1         2         3         4         5         6
# day
# Fri   0.052632  0.842105  0.052632  0.052632  0.000000  0.000000
# Sat   0.022989  0.609195  0.206897  0.149425  0.011494  0.000000
# Sun   0.000000  0.513158  0.197368  0.236842  0.039474  0.013158
# Thur  0.016129  0.774194  0.064516  0.080645  0.016129  0.048387

#Plot with a stacked bar chart
party_counts.plot(kind='bar', stacked=True)
plt.show()
plt.savefig("image.png ")

image.png

From this graph, we can see that the number of people increases on weekends (Saturday and Sunday). There is almost no one customer on Sundays, and the proportion of group customers who are thought to be with a family of 3 to 4 people is clearly increasing.

Histogram and fitting

The bar graph represents this when the frequency of values is a discrete variable. Let's show the ratio of chips to the total amount in a bar graph.

Fitting a continuous probability distribution to a probability distribution such as a normal distribution I explained earlier using Gaussian fitting as an example. ** kernel density estimate ** Plots are called KDE plots. You can make a density plot using mixed normal distribution kernel density estimation by specifying kind ='kde' for plot.

fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)

tips['tip_pct'] = tips['tip'] / tips['total_bill']
result = tips['tip_pct']

result.plot(kind='kde')
ax1.hist(result, bins=50, alpha=0.6)

plt.show()
plt.savefig("image2.png ")

image2.png

You can do something like fitting by plotting the kernel density estimate on top of the normalized histogram. This is a common technique.

Let's try fitting a plot drawn with two different standard normal distributions N (0,1) and N (10,4).

fig = plt.figure()
ax = fig.add_subplot(1,1,1)

#Normal distribution part 1
comp1 = np.random.normal(0,1,size=200) # N(0,1)
#Normal distribution part 2
comp2 = np.random.normal(10,2,size=200) # N(10,4)

#Combine two normal distributions into one series
values = Series(np.concatenate([comp1, comp2]))

print( values )
# =>
# [4 rows x 6 columns]
# 0    -0.305123
# 1    -1.663493
# 2     0.845320
# 3     1.217024
# 4    -0.597437
# 5     0.559524
# 6     0.849613
# 7    -0.916863
# 8     2.705579
# 9     1.397815
# 10   -1.135680
# 11    0.322982
# 12    0.568366
# 13    0.567607
# 14    0.360048
# ...
# 385    15.695692
# 386     8.868396
# 387     8.625446
# 388     5.793579
# 389     8.169981
# 390     8.434327
# 391    10.305067
# 392    11.032880
# 393     8.319812
# 394     9.026077
# 395     9.534395
# 396     4.498352
# 397    12.557349
# 398     7.365278
# 399    11.065254
# Length: 400, dtype: float64

#Draw a bar graph
values.hist(bins=100, alpha=0.3, color='b', normed=True)
#Kernel density estimation
values.plot(kind='kde', style='r--')

plt.show()
plt.savefig("image3.png ")

image3.png

reference

Introduction to data analysis with Python-Data processing using NumPy and pandas http://www.oreilly.co.jp/books/9784873116556/

Recommended Posts

Data visualization method using matplotlib (+ pandas) (5)
Data visualization method using matplotlib (+ pandas) (3)
Data visualization method using matplotlib (+ pandas) (4)
Data visualization method using matplotlib (1)
Data visualization method using matplotlib (2)
Data visualization with pandas
Implement "Data Visualization Design # 3" with pandas and matplotlib
Python application: data visualization # 2: matplotlib
Data analysis using python pandas
Graph time series data in Python using pandas and matplotlib
Cases using pandas plot, cases using (pure) matplotlib plot
Implement "Data Visualization Design # 2" with matplotlib
Read pandas data
Visualization of latitude / longitude coordinate data (assuming meteorological data) using cartopy and matplotlib
Try using PHATE, a dimensionality reduction and visualization method for biological data
Try using matplotlib
Read Python csv data with Pandas ⇒ Graph with Matplotlib
[Pandas] Basics of processing date data using dt
100 language processing knock-20 (using pandas): reading JSON data
100 language processing knock-98 (using pandas): Ward's method clustering
100 language processing knock-99 (using pandas): visualization by t-SNE
Data analysis using xarray
Analysis of financial data by pandas and its visualization (2)
Get Amazon RDS (PostgreSQL) data using SQL with pandas
Python Data Visualization Libraries
Analysis of financial data by pandas and its visualization (1)
Cross tabulation using Pandas
Data analysis using Python 0
How to scrape horse racing data using pandas read_html
Graph drawing using matplotlib
[Latest method] Visualization of time series data and extraction of frequent patterns using Pan-Matrix Profile
Data cleansing 2 Data cleansing using DataFrame
I tried using matplotlib
Data cleaning using Python
I tried clustering ECG data using the K-Shape method
[Python] Summary of table creation method using DataFrame (pandas)
Data manipulation with Pandas!
Process csv data with python (count processing using pandas)
Shuffle data with pandas
Visualization method of data by explanatory variable and objective variable
[Memo] Text matching in pandas data frame using flashtext
Method call using __getattr__
[Numpy / pandas / matplotlib Exercise 01]
Instantly create a diagram of 2D data using python's matplotlib
How to add new data (lines and plots) using matplotlib
Easy-to-understand [Pandas] practice / data confirmation method for high school graduates
Analyze stock prices using pandas data aggregation and group operations
[Python] Random data extraction / combination from DataFrame using random and pandas
pandas Matplotlib Summary by usage
Graph Excel data with matplotlib (1)
Draw multiple graphs using Pandas
Try using matplotlib with PyCharm
Select features using text data
Classify data by k-means method
Graph drawing method with matplotlib
Visualization of data by prefecture
Graph Excel data with matplotlib (2)
Linear regression method using Numpy
Visualization memo by pandas, seaborn
SQL connection method using pyodbc
Data processing tips with Pandas