Summary of the basic flow of machine learning with Python

I recently learned about machine learning, so I've summarized the steps to take when implementing it in Python.

Data preprocessing

In machine learning, it is first necessary to read data and know what kind of distribution it has. I will write about the procedure

Data reading

In order to actually read the data, read the csv file using the method called read_csv of pandas that was read earlier.

#Loading the library
import pandas as pd
import numpy as np

#Hoge directly below.load csv
df = pd.read_csv("./hoge.csv") 
#Extract only 5 lines from the top
df.head() 

Confirmation of read data

When it comes to machine learning, there is an image that if you put in the data, it will do something about it, but in reality you need to look closely at the data. For example, are there any missing values, are there too much variation, and are there correlations?

Basic statistics

By entering the following code, you can see the number of data, mean value, standard deviation, minimum value, maximum value, etc. at once.

#Calculation of statistics
df.describe() 

You can see the basic statistics in a list like this 基本統計量.PNG

Confirmation of distribution

However, it is difficult to understand just by looking at the numbers such as standard deviation and mean value, so it is easier for humans to understand it by graphing it. Therefore, it shows the distribution.

%matplotlib inline
#Loading a library that displays a graph called seaborn
import seaborn as sns

#Confirmation of distribution
sns.distplot(df["x1"]) #Check the data first (here, check the data in column x1)

This is nice data because the data seems to follow a normal distribution x6の分布.PNG

Confirmation of correlation coefficient

If the data has no correlation at all, there is no point in training it, so check the correlation coefficient. By the way, the correlation coefficient is between -1 and +1 and the higher it is, the more correlated it is.

#Calculation of correlation coefficient
df.corr()
#Check the correlation coefficient with a graph
sns.pairplot(df)

Separation of input variables and output variables

Actually, we will create something like $ y = w0x0 + w1x1 + w2x2 + ... + $, so we need to divide the contents of the data into the output variable y and the input variable x. At that time, I use a method called iloc of pandas.

#df.iloc[line,Column]とすることでそのlineとColumnのデータを取り出せる
Example example= df.iloc[1,3]
Result 100
#Last column-Fetch all rows up to 1(Input variable X)
 X = df.iloc[:,:-1]
#You can write it like this, but the versatility is low
 X = df.iloc[:,:Last column number]

#Take out y
 y = df.iloc[:,-1]

Machine learning using preprocessed data

By doing the above work, we are ready to actually perform machine learning. We will actually learn from the next. Here, we use scikit-learn, a machine learning library.

Divide into training data and verification data

The purpose of machine learning was to train data and make predictions when unknown data was entered. In other words, the data used for learning is not used. Obviously, if you put in the data used for learning and make a prediction, you will get an accurate answer because you learned with that data, right? Will be

Therefore, it is necessary to separate training data (train) and verification data (test) before training.

from sklearn.model_selection import train_test_split

#Separation of training data and verification data
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.4,random_state=1)

testsize specifies the ratio of training data and verification data, and here it is set as learning: training = 6: 4. Also, random_state is fixed to maintain reproducibility.

Model construction / verification

With scikit-learn, you can build and verify the model with just the following code The model used for learning this time is multiple regression analysis

#Library import
from sklearn.linear_model import  LinearRegression

#Model declaration (Linear Regression means multiple regression analysis)
model = LinearRegression()

#Model training (adjusting parameters)
model.fit(X,y)

#Check parameters
model.coef_

#Coefficient of determination(Prediction accuracy) 0~Higher between 1 is better
model.score(X,y)

#Predicted value calculation
x = X.iloc[0,:] #Take out the first line of X
y_pred = model.predict([x])

Save / load model

You can save the model with the following code

#import
from sklearn.externals import joblib

#Save model (hoge.Save as pkl
joblib.dump(model,"hoge.pkl")

Load the model with the following code

#hoge.Loading pkl
model_new = joblib.load("hoge.pkl") 

#Display the predicted value of the loaded model
model_new.predict([x])[0]

The above is the basic flow of machine learning. This time, the model was performed by multiple regression analysis, but the basic flow is the same when, for example, you want to perform logistic regression or SVM.

Recommended Posts

Summary of the basic flow of machine learning with Python
Machine learning with Python! Preparation
Beginning with Python machine learning
Predict the gender of Twitter users with machine learning
Record of the first machine learning challenge with Keras
Align the number of samples between classes of data for machine learning with Python
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Machine learning ③ Summary of decision tree
"Scraping & machine learning with Python" Learning memo
Basic study of OpenCV with Python
The result of Java engineers learning machine learning in Python www
I wrote the basic grammar of Python with Jupyter Lab
A beginner's summary of Python machine learning is super concise.
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
The first step of machine learning ~ For those who want to implement with python ~
[Examples of improving Python] Learning Python with Codecademy
A story stuck with the installation of the machine learning library JAX
Amplify images for machine learning with python
[Machine learning] Check the performance of the classifier with handwritten character data
Machine learning with python (2) Simple regression analysis
Learning notes from the beginning of Python 1
Check the existence of the file with python
"Python Machine Learning Programming" Summary Note (Jupyter)
Python Basic Course (at the end of 15)
Learning notes from the beginning of Python 2
Build AI / machine learning environment with Python
Basic summary of scraping with Requests that beginners can absolutely understand [Python]
REST API of model made with Python with Watson Machine Learning (CP4D edition)
Source code of sound source separation (machine learning practice series) learned with Python
A beginner of machine learning tried to predict Arima Kinen with python
Basic machine learning procedure: ③ Compare and examine the selection method of features
Python learning memo for machine learning by Chainer until the end of Chapter 2
[Python] Easy introduction to machine learning with python (SVM)
The story of doing deep learning with TPU
Machine learning starting with Python Personal memorandum Part2
The story of low learning costs for Python
Prepare the execution environment of Python3 with Docker
Summary of the differences between PHP and Python
2016 The University of Tokyo Mathematics Solved with Python
Python & Machine Learning Study Memo ⑤: Classification of irises
See the behavior of drunkenness with reinforcement learning
Machine learning starting with Python Personal memorandum Part1
Calculate the total number of combinations with python
Python & Machine Learning Study Memo ②: Introduction of Library
Check the date of the flag duty with Python
[Python] Collect images with Icrawler for machine learning [1000 images]
This is the only basic review of Python ~ 1 ~
This is the only basic review of Python ~ 2 ~
About the development contents of machine learning (Example)
[Python] Summary of S3 file operations with boto3
Get a glimpse of machine learning in Python
I started machine learning with Python Data preprocessing
This is the only basic review of Python ~ 3 ~
Convert the character code of the file with Python3
[Python] Determine the type of iris with SVM
Build a Python machine learning environment with a container
Machine learning tutorial summary
"Object-oriented" learning with python
Machine learning ⑤ AdaBoost Summary