Create a decision tree from 0 with Python and understand it (3. Data analysis library Pandas edition)

** Create and understand decision trees from scratch in Python ** 1. Overview-2. Python Program Basics --3 Data Analysis Library Pandas

I will explain how to use the Pandas library to create a decision tree.

3.1 Library import

#Import pamdas and declare it to be used in the program with the name pd.
import pandas as pd

3.2 DataFrame, Series pandas uses Data Frames and Series. When data is represented like an Excel table as shown in the following figure, when a row is one data and a column is an attribute of data, DataFrame represents the entire table and Series represents one row. I will.

3.3 DataFrame generation

Read an Excel file. read_excel [ExcelWriter](https://pandas.pydata.org/pandas-docs/stable/ reference / api / pandas.ExcelWriter.html)

#Upload the Excel file to the same location as this ipynb file.
df0 = pd.read_excel("data_golf.xlsx")

#Display the DataFrame as an HTML table.
from IPython.display import HTML
html = "<div style='font-family:\"Meiryo\";'>"+df0.to_html()+"</div>"
HTML(html)


#Save to Excel file(with is f.Something that automatically executes the close process)
with pd.ExcelWriter("data_golf2.xlsx") as f:
    df0.to_excel(f)

How to generate from dictionary type (associative array): Dictionary type (associative array) organizes data in columns. DataFrame

#Generated from dictionary type: Collect data by columns.
d = {
    "weather":["Fine","Fine","Cloudy","rain","rain","rain","Cloudy","Fine","Fine","rain","Fine","Cloudy","Cloudy","rain"],
    "temperature":["Hot","Hot","Hot","Warm","Ryo","Ryo","Ryo","Warm","Ryo","Warm","Warm","Warm","Hot","Warm"],
    "Humidity":["High","High","High","High","usually","usually","usually","High","usually","usually","usually","High","usually","High"],
    "Wind":["Nothing","Yes","Nothing","Nothing","Nothing","Yes","Yes","Nothing","Nothing","Nothing","Yes","Yes","Nothing","Yes"],
    "golf":["×","×","○","○","○","×","○","×","○","○","○","○","○","×"],
}
df0 = pd.DataFrame(d)

How to generate from an array: Organize the data in rows. DataFrame

#Generate from array: Organize data in rows.

d = [["Fine","Hot","High","Nothing","×"],
     ["Fine","Hot","High","Yes","×"],
     ["Cloudy","Hot","High","Nothing","○"],
     ["rain","Warm","High","Nothing","○"],
     ["rain","Ryo","usually","Nothing","○"],
     ["rain","Ryo","usually","Yes","×"],
     ["Cloudy","Ryo","usually","Yes","○"],
     ["Fine","Warm","High","Nothing","×"],
     ["Fine","Ryo","usually","Nothing","○"],
     ["rain","Warm","usually","Nothing","○"],
     ["Fine","Warm","usually","Yes","○"],
     ["Cloudy","Warm","High","Yes","○"],
     ["Cloudy","Hot","usually","Nothing","○"],
     ["rain","Warm","High","Yes","×"],
    ]
df0 = pd.DataFrame(d,columns=["weather","temperature","Humidity","Wind","golf"])

3.4 Acquisition of table information, etc.

#Get table information, etc.

#Number of rows and columns
print(df0.shape) #output(14, 5)

#Get the number of lines
print(df0.shape[0]) #Output 14

#Get column name
print(df0.columns) #Output Index(['weather', 'temperature', 'Humidity', 'Wind', 'golf'], dtype='object')

#Get row name (The row name of df0 is an automatically assigned index)
print(df0.index) #Output RangeIndex(start=0, stop=14, step=1)

3.5 Get Value loc [iloc](https: // pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc) [values](https://pandas.pydata.org/pandas-docs/stable/ reference / api / pandas.DataFrame.values.html # pandas.DataFrame.values)

#Get value

#Get the value by specifying the row and column.
#Line number 1(Second data),Get the humidity of.
print(df0.loc[1,"Humidity"]) #Output high

#Specify multiple rows and columns in an array to get the value.
#Line number 1,2,The weather and golf values of 4 are acquired together, and the acquired data is also of DataFrame type.
df = df0.loc[[1,2,4],["weather","golf"]]
print(df)
#output
#Weather golf
#1 fine ×
#2 Cloudy ○
#4 Rain ○
print(type(df)) #output<class 'pandas.core.frame.DataFrame'>

#Slices (processes for extracting arrays) can also be used to specify rows and columns in arrays.
#Get the data for all columns in rows 1 to 4. loc specifies a name, so 1:If it is 4, it includes 4.
df = df0.loc[1:4,:]
print(df)
#output
#Weather Temperature Humidity Wind Golf
#1 Fine heat Yes ×
#2 Cloudy heat No high ○
#3 Rain, warmth, no height ○
#4 Rain Ryo Normal None ○

#iloc allows you to index rows and columns. The index is counted from 0.
#Get data other than the last column (golf) in rows 1 to 3. iloc specifies an index, so 1:If it is 4, it does not include 4.
df = df0.iloc[1:4,:-1]
print(df)
#output
#Weather Temperature Humidity Wind
#1 Sunny, hot and hot
#2 Cloudy, hot, high, no
#3 Rain, warmth, no height

#1 line(Series)Get value from
#Get the data in the first row. s is Series type
s = df0.iloc[0,:]
#Like the dictionary type, s["Column name"]You can get the value with.
print(s["weather"]) #Output fine

#Array all values(numpy.ndarray)Get in the format of.
print(df0.values)

3.6 Data loop, looking at sequential data iterrows [iteritems](https: / /pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iteritems.html)

#Let's look at the data loop and sequential data.

#Loop on a line. Look at the data line by line.
for i,row in df0.iterrows():
    #i is the row name (row index), row is Series
    print(i,row)
    pass
    
#Loop in columns. Look at the data vertically, column by column.
for i,col in df0.iteritems():
    #i is the column name, col is Series
    print(i,col)
    pass

3.7 Frequency value_counts

#frequency(Number of data appearances)

#Get all the data for the weather column. s is Series
s = df0.loc[:,"weather"]

#Get what data and how many.
print(s.value_counts())
#output
#Fine 5
#Rain 5
#Cloudy 4
# Name:weather, dtype: int64

#For example, get the number of fine weather.
print(s.value_counts()["Fine"]) #Output 5

3.8 Extracting Specific Data query

#Extraction of specific data

#Acquisition of data on fine weather
print(df0.query("weather=='Fine'"))
#output
#Weather Temperature Humidity Wind Golf
#0 Fine heat No high ×
#1 Fine heat Yes ×
#7 Fine, warm, high, no ×
#8 Sunny Ryo Normal None ○
#10 Sunny Warm Normal Yes ○

#Get data to go golf when the weather is fine
print(df0.query("weather=='Fine'and golf=='○'"))
#output
#Weather Temperature Humidity Wind Golf
#8 Sunny Ryo Normal None ○
#10 Sunny Warm Normal Yes ○

#Get data when the weather is fine or go golf
print(df0.query("weather=='Fine'or golf=='○'"))
#output
#Weather Temperature Humidity Wind Golf
#0 Fine heat No high ×
#1 Fine heat Yes ×
#2 Cloudy heat No high ○
#3 Rain, warmth, no height ○
#4 Rain Ryo Normal None ○
#6 Cloudy Ryo Normal Yes ○
#7 Fine, warm, high, no ×
#8 Sunny Ryo Normal None ○
#9 Rain Warm Normal None ○
#10 Sunny Warm Normal Yes ○
#11 Cloudy Warm High Yes ○
#12 Cloudy heat Normal None ○

Recommended Posts

Create a decision tree from 0 with Python and understand it (3. Data analysis library Pandas edition)
Make a decision tree from 0 with Python and understand it (4. Data structure)
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
2. Make a decision tree from 0 with Python and understand it (2. Python program basics)
Create a decision tree from 0 with Python (1. Overview)
Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
Create a USB boot Ubuntu with a Python environment for data analysis
Extract data from a web page with Python
Data analysis with python 2
Data analysis with Python
Data analysis environment construction with Python (IPython notebook + Pandas)
Get mail from Gmail and label it with Python3
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Until you create a machine learning environment with Python on Windows 7 and run it
I tried scraping food recall information with Python to create a pandas data frame
A memo that reads data from dashDB with Python & Spark
Associate Python Enum with a function and make it Callable
Create applications, register data, and share with a single email
Let's create a PRML diagram with Python, Numpy and matplotlib.
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
I made a server with Python socket and ssl and tried to access it from a browser
Create a simple video analysis tool with python wxpython + openCV
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
Hash with python and escape from a certain minister's egosa
Python: Create a dictionary from a list of keys and values
Create a deploy script with fabric and cuisine and reuse it
Library for specifying a name server and dig with python
Create a directory with python
[Python] Random data extraction / combination from DataFrame using random and pandas
Data analysis using python pandas
Create a temporary file with django as a zip file and return it
Practical exercise of data analysis with Python ~ 2016 New Coder Survey Edition ~
Create a striped illusion with gamma correction for Python3 and openCV3
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
I made a segment tree with python, so I will introduce it
Create a C ++ and Python execution environment with WSL2 + Docker + VSCode
Create a simple Python development environment with VS Code and Docker
Get OCTA simulation conditions from a file and save with pandas
Hit treasure data from Python Pandas
Creating a decision tree with scikit-learn
Create folders from '01' to '12' with python
Create a pandas Dataframe from a string.
Create a virtual environment with Python!
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
Clogged when getting data from DB and making it a return value
Read and format a csv file mixed with comma tabs with Python pandas
[AWS] Create a Python Lambda environment with CodeStar and do Hello World
I used phantomjs from Python's selenium library and it became a zombie
Deploy a Python app on Google App Engine and integrate it with GitHub
Create a tool to automatically furigana with html using Mecab from Python3
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 3: Data Exploration and Visualization
[DSU Edition] AtCoder Library reading with a green coder ~ Implementation in Python ~
Create a stack with a queue and a queue with a stack (from LetCode / Implement Stack using Queues, Implement Queue using Stacks)
Join data with main key (required) and subkey (optional) in Python pandas
Create a Python3 environment with pyenv on Mac and display a NetworkX graph
Data pipeline construction with Python and Luigi
Receive textual data from mysql with python
Create a Python function decorator with Class