Let's use Python to represent the frequency of binary data contained in a data frame in a single bar graph.

In checking the contents of the data There are times when you want to represent how much binary data is included in a single bar graph.

Was it a bad way to find it? I haven't come up with a concise method to represent it with a single bar graph, so I will include the output as well.

This time, I am creating a bar graph that includes not only binary data but also 6-value data.

environment

I used Google Colab. The version of the library used is as follows.

Library version
python 3.6.9
pandas 1.1.4
seaborn 0.11.0
matplotlib 3.2.2

Library import

Before using the above module, import it.

%matplotlib inline
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

Data set import

This time, we will use the tips of the learning dataset included in seaborn. This dataset has This includes the total amount paid for dinner and lunch, the amount of tips included, and the gender of the person who paid.

#Data frame import
tips = sns.load_dataset('tips')

#Check the first 5 lines
display(tips.head())
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

If you see the above output, the import is successful.

Data processing

Create a bar graph using the following four columns. To create a graph, you need to convert qualitative values to quantitative values.

Column name Overview policy
sex sex(Male/Female) Male -> 0, Female -> 1
smoker smoking(No/Yes) No -> 0, Yes -> 1
time Meal time(Lunch/Dinner) Lunch -> 0, Dinner -> 1
size Number of people(1 ~ 6) Use as it is

I think there are many ways to do this, It was carried out as follows.

#Quantify sex(Male1 -> 0, Female -> 1)
tips.sex = tips.sex.replace("Male", 0).replace("Female", 1)

#Quantify smoker(No -> 0, Yes -> 1)
tips.smoker = tips.smoker.replace("No", 0).replace("Yes", 1)

#Quantify time(Lunch -> 0, Dinner -> 1)
tips.time = tips.time.replace("Lunch", 0).replace("Dinner", 1)

#Check the first 5 lines
display(tips.head())
total_bill tip sex smoker day time size
0 16.99 1.01 1 0 Sun 1 2
1 10.34 1.66 0 0 Sun 1 3
2 21.01 3.50 0 0 Sun 1 3
3 23.68 3.31 0 0 Sun 1 2
4 24.59 3.61 1 0 Sun 1 4

In this way, you can see that it has been replaced with 0s and 1s.

Graphing the frequency

Now, the main subject. Define the column names to be included in the bar graph in label in list format. Then get the unique value for each column and its number. In this state, index is the column name and column name is the value, so replacement is being executed.

#Definition of label list to be stored in bar graph
label = ["sex", "smoker", "time", "size"]

#Get a unique value for each label
tips_ = [tips[l].value_counts() for l in label]

#Convert to data frame and swap index and column
tips_ = pd.DataFrame(tips_).transpose()

#Graph display
tips_.plot.bar()
plt.grid()
plt.title("Frequency of values in each label")
plt.ylabel("counts")
plt.xlabel("value")
plt.show()

image.png

I was able to create the graph I wanted to find in this way.

Summary

I created a new data frame using value_counts and output a bar chart of the frequency.

Referenced

Recommended Posts

Let's use Python to represent the frequency of binary data contained in a data frame in a single bar graph.
Let's use the open data of "Mamebus" in Python
How to use the __call__ method in a Python class
How to quickly count the frequency of appearance of characters from a character string in Python?
I made a program in Python that changes the 1-minute data of FX to an arbitrary time frame (1 hour frame, etc.)
How to determine the existence of a selenium element in Python
[Introduction to Python] How to use the in operator in a for statement?
How to check the memory size of a dictionary in Python
How to send a visualization image of data created in Python to Typetalk
I made a program to check the size of a file in Python
I tried to display the altitude value of DTM in a graph
A super beginner who does not know the basics of Python tried to graph the realized profit and loss data of Rakuten Securities in Python
Various ways to read the last line of a csv file in Python
How to pass the execution result of a shell command in a list in Python
How to use the C library in Python
Draw a graph of a quadratic function in Python
Get the caller of a function in Python
Make a copy of the list in Python
Summary of how to use MNIST in Python
Use PIL in Python to extract only the data you want from Exif
Paste a link to the data point of the graph created by jupyterlab & matplotlib
The story of reading HSPICE data in Python
Change the data frame of pandas purchase data (id x product) to a dictionary
Output in the form of a python array
Basic data frame operations written by beginners in a week of learning Python
A well-prepared record of data analysis in Python
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
Use Pandas to write only the specified lines of the data frame to an excel file
I want to use a python data source in Re: Dash to get query results
I created a stacked bar graph with matplotlib in Python and added a data label
What to do if the progress bar is not displayed in tqdm of python
Find out the maximum number of characters in multi-line text stored in a data frame
How to check in Python if one of the elements of a list is in another list
Let's make a leap in the manufacturing industry by utilizing the Web in addition to Python
I tried to graph the packages installed in Python
Summary of tools needed to analyze data in Python
How to get the number of digits in Python
I want to write in Python! (2) Let's write a test
[Python] How to use the graph creation library Altair
Let's use the Python version of the Confluence API module.
Not being aware of the contents of the data in python
How to use the model learned in Lobe in Python
A reminder about the implementation of recommendations in Python
To do the equivalent of Ruby's ObjectSpace._id2ref in Python
I want to use the R dataset in python
Python Note: The mystery of assigning a variable to a variable
[C / C ++] Pass the value calculated in C / C ++ to a python function to execute the process, and use that value in C / C ++.
Use hash to lighten collision detection of about 1000 balls in Python (related to the new coronavirus)
Get the value of a specific key up to the specified index in the dictionary list in Python
[OCI] Python script to get the IP address of a compute instance in Cloud Shell
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
What seems to be a template of the standard input part of the competition pro in python3
I tried to create a Python script to get the value of a cell in Microsoft Excel
I made a function to crop the image of python openCV, so please use it.
How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
How to pass the execution result of a shell command in a list in Python (non-blocking version)
Find out the apparent width of a string in python
Have the equation graph of the linear function drawn in Python
Various ways to calculate the similarity between data in python