[PYTHON] Histogram with matplotlib

I made a histogram from the records of 5 subjects and total points of the mock exam with matplotlib. ・ Matplotlib ・ Histogram (plt.hist) ・ Graph output with for statement -Color coding for histogram bars with patches

Histogram of practice test

・ Targets are Japanese, math, English, social studies, science, and total points.

・ 100 points each for Japanese, math, English, social studies, and science

・ Csv https://drive.google.com/file/d/1EzctLYN5-UvkmkOgZ7usPgtsQn7bdq5y/view?usp=sharing

・ The total score is 500 points. 20200916000204.png


Load the library


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Create a data frame. Give name in the 1st to 6th columns. (If csv is in the same directory as python's .ipynb, you can use "~~~ .csv".)

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])

Check the storage status. (You can now see the first line.)

df.head()

20200916000914.png

I'm not going to analyze it this time, but I'll give you the big picture with describe ().

df.describe()

20200916001311.png

Ask ** matplotlib ** to write a histogram of df ['national language'] by default.

plt.hist(df['National language'])
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

20200916002301.png

The default is subtle. Due to the nature of test scores ・ Range from 0 to 100 points ** range = (0, 100) ** ・ 10 sticks ** bins = 10 ** Is it easy to see?

So, order range and bins in () of hist in ** matplotlib **.

# hist()Add in
plt.hist(df['National language'], range=(0,100), bins=10,)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

20200916003812.png

Next is the axis. ・ X-axis Since it is 0 to 100, ** plt.xlim (0, 100) ** ・ Y-axis It is hard to compare with the height fluctuating depending on the subject. This time it is for 15 people, so for the time being, 8 people will be ** plt.ylim (0,8) **. If you specify here, you can adjust here even if you exceed 8 people.

plt.hist(df['National language'], range=(0,100), bins=10,)
#Add here
plt.xlim(0,100)
plt.ylim(0,8)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

20200916005618.png

The prototype looks like this.


Adjust a little finer design. ** 1. I want a grid line to read the scale ** ** 2. Try changing the color with less than half the score **


  1. Draw a horizontal line on the number of people. plt.grid(True)
plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
#add to
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

20200916011819.png

** 2. Change the color with less than half the score. ** ** I had a hard time. In plt.hist () if (49 points or less):   range=(0,50), bins=5 else (50 points or more):   range=(51,100), bins=5 Even though I think about color coding, it seems hard.

Is it possible to re-divide each subject in the data frame into 50 points or less and 50 points or more each time?

However, due to the nature of this time, a neatly fixed stick will grow, so can I ** color-code the stick **? In other words, I want to make the 1st to 5th bars red for ** bars. ** ** Here, I used the return value in hist.

Reference n, bins, patches = hist(○○)
n: Y-axis value data bins: X-axis value data patches: List of patches (Patch = ** Object of each bar in the histogram **)

I want to color-code the 1st to 5th of this ** patch **.

#Set red to facecolor (stick color) for the first patch (stick)
patches[0].set_facecolor('red')

I used the for statement because I can repeat this from the 1st to the 5th.

for i in range(0, 5):
    patches[i].set_facecolor('red')

Now that the preparation for color coding is complete, add this for statement.

plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
#Postscript
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

20200916020839.png

It will appear if patches are not defined. Do I need to put ** paths ** somewhere? Borrow the previous one ** n, bins, patches = hist () ** and it worked.

#Add here
n, bins, patches = plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

20200916021239.png

Complete.

Bloody red is also unlucky, so adjust the transparency (alpha). ** alpha = 0.5 ** This is additionally ordered in hist ().

# hist()Alpha is also added in
n, bins, patches = plt.hist(df['National language'], range=(0,100), bins=10, alpha=0.5)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

20200916022138.png


After that, use the for statement to turn it all at once.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])
#Set a variable called subject and process one subject at a time.
for subject in ['National language','Math','English','society','Science']:
# df[ ]The contents are changed according to the subject.
    n, bins, patches = plt.hist(df[subject], range=(0,100), bins=10, alpha=0.5)
    plt.xlim(0,100)
    plt.ylim(0,8)
    plt.grid(True)
#title( )If the content is also subject, the title label will change automatically.
    plt.title(subject)
    plt.xlabel('score')
    plt.ylabel('Number of people')
    for i in range(0, 5):
        patches[i].set_facecolor('red')
    plt.show()

20200916024212.png

With this, 5 sheets came out at once.

The rest is the total score. Just give it a perfect score of 500. Pick up the total of data frames ・ ** range = (0,500) ** ・ ** plt.xlim (0,500) ** Change to and you're done.

Finally, I'll put together the code used in this requirement without annotations.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])

for subject in ['National language','Math','English','society','Science']:
    n, bins, patches = plt.hist(df[subject], range=(0,100), bins=10, alpha=0.5)
    plt.xlim(0,100)
    plt.ylim(0,8)
    plt.grid(True)
    plt.title(subject)
    plt.xlabel('score')
    plt.ylabel('Number of people')
    for i in range(0, 5):
        patches[i].set_facecolor('red')
    plt.show()

n, bins, patches = plt.hist(df['total'], range=(0,500), bins=10, alpha=0.5)
plt.xlim(0,500)
plt.ylim(0,8)
plt.grid(True)
plt.title('total')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

If you just output from python, this seems to be no problem, but if you use it realistically, I think that it can not be a final implementation mechanism that works on the network to make it universal. I did.

Recommended Posts

Histogram with matplotlib
Animation with matplotlib
Japanese with matplotlib
Animation with matplotlib
Animate with matplotlib
Write a stacked histogram with matplotlib
2-axis plot with Matplotlib
Heatmap with Python + matplotlib
Band graph with matplotlib
Learn with Cheminformatics Matplotlib
Real-time drawing with matplotlib
Various colorbars with Matplotlib
3D plot with matplotlib
Adjust axes with matplotlib
Pandas basics for beginners ③ Histogram creation with matplotlib
Graph Excel data with matplotlib (1)
Try using matplotlib with PyCharm
Graph drawing method with matplotlib
Graph Excel data with matplotlib (2)
Histogram transparent overlay by Matplotlib
Stackable bar plot with matplotlib
[Python] How to create a 2D histogram with Matplotlib
Histogram parameter excerpt from matplotlib
Gradient color selection with matplotlib
Animate multiple graphs with matplotlib
Inference & result display with Tensorflow + matplotlib
Japaneseize Matplotlib with Alpine using Docker
[Python] font family and font with matplotlib
Add cumulative ratio to matplotlib histogram
Draw Japanese with matplotlib on Ubuntu
Draw a loose graph with matplotlib
Heatmap with Dendrogram in Python + matplotlib
Easy Japanese font setting with matplotlib
Show dividing lines in matplotlib histogram
Easy to draw graphs with matplotlib
Continuously color with matplotlib scatter plot
Draw Lyapunov Fractal with Python, matplotlib
When matplotlib doesn't work with python2.7
Lognormal probability plot with Python, matplotlib
Easy animation with matplotlib (mp4, gif)
Implement "Data Visualization Design # 2" with matplotlib
Matplotlib memorandum
[Python] Set the graph range with matplotlib
Adjust the spacing between figures with Matplotlib
Align the size of the colorbar with matplotlib
Adjust the bin width crisply and neatly with the histogram of matplotlib and seaborn
Matplotlib gallery
Try drawing a normal distribution with matplotlib
Matplotlib memo
Write SVG graphs with matplotlib on heroku
Display Japanese graphs with VS Code + matplotlib
Heat Map for Grid Search with Matplotlib
Draw hierarchical axis labels with matplotlib + pandas
[Python] Let's make matplotlib compatible with Japanese
Graph trigonometric functions with numpy and matplotlib
matplotlib summary
Display markers above the border with matplotlib
Match the colorbar to the figure with matplotlib
[Jupyter Notebook memo] Display kanji with matplotlib
Write a nice pie chart with matplotlib
Make common settings with subplot of matplotlib