Python: Supervised Learning (Regression)

What is supervised learning?

Types of machine learning

Machine learning is divided into three main areas.

1, supervised learning Machines predict new and future data based on accumulated data Or it means to classify. This includes stock price forecasts and image identification.

2, unsupervised learning It means that the machine finds out the structure and relationship of the accumulated data. It is used in retail store customer trends and Google image recognition.

3, Reinforcement learning The learning form is similar to unsupervised learning, but at the time of learning by setting rewards and goals It is a method of learning to maximize profits. It is often used as a competitive AI such as Go.

Of these, supervised learning can be broadly divided into two methods: regression and classification.

Machine learning with scikit-learn

We will use scikit-learn, which is a module for machine learning.


#Import the required modules.
import request
from sklearn.linear_model import LinearRegression

#Next, load the data you want to train. See the code in this issue for detailed code.
#Train as follows_X, test_X, train_y, test_The data is loaded in four files called y.
train_X, test_X, train_y, test_y = (Data information)

#Build a learner.
#A learning device is a learning model(Learning method)An object designed to train along with.
# scikit-Learn's Linear Regression learns and returns predictive data.
#The details of this Linear Regression will be dealt with in the next and subsequent sessions.
model = LinearRegression()

#Teacher data(Existing data for learning)Let the learner learn using.
model.fit(train_X, train_y)

#Let the learner make predictions using test data prepared separately from the teacher data.
pred_y = model.predict(test_X)

#An evaluation value called the coefficient of determination is calculated to confirm the performance of the learner.
score = model.score(test_X, test_y)

Linear regression

What is linear regression?

Regression analysis is based on the relationship between the data you want to predict and the data you already know. It's an estimation approach. Ultimately, we call it "regression" when predicting numbers.

It is easy to understand how many kilometers you ran at 100km / h in one hour after returning (predicting). Among them, 100 is the coefficient of the data.

In linear regression, for the data you want to predict by looking at the coefficients of the data used for prediction You can see the magnitude of the contribution of that data.

In looking at the magnitude of the contribution of data By creating a formula that maximizes profits from shopping and purchasing Creating a calculation formula is essential so that you can understand what measures to take.

Coefficient of determination

The coefficient of determination is the data predicted by linear regression and the actual data. It is an index showing how well they match. It also shows how much you can trust the coefficients (magnitude of contribution) of each data.

If the predicted score is 70 Actually, if it is 20 points, the coefficient of determination will be close to 0. Actually, when the score is 71, it will be as close to 1 as possible.

The coefficient of determination takes a number from 0 to 1, and the larger the value, the better the accuracy of the function. If the value is about 0.8 or more, the accuracy of the function can be seen as good. However, a number less than or equal to 0.8 does not mean that the function is useless.

If the coefficient of determination is of a certain magnitude (the standard varies from person to person, but about 0.4 or more), the magnitude of the contribution of the data is reliable to some extent.

Linear simple regression

Linear simple regression is a regression analysis that obtains one data (ex. Amount of water) to be predicted from one data (ex. Time). It is often used when investigating data relationships and rarely when making predictions.

Here, the data you want to predict is yy, and the data used for prediction is xx.

y=ax+by=ax+b Estimate aa and bb, assuming that there is a relationship.

There are various methods for estimating aa and bb, but this time we will use a method called the least squares method. Make sure that the sum of the squares of the difference between the actual yy value and the estimated y (= ax + b) y (= ax + b) value is minimized. How to determine aa and bb.

In the figure below, determine a and b so that the sum of the distances from the orange data points is minimized. In this way, draw the closest straight line to the existing data and infer future data from that straight line.

image.png

The reason for squared the error here is to prevent the error from being offset by the difference between positive and negative. For example, if you simply add the errors of +2 and -2, the value will be 0 and the error will be offset.

Now, in order to actually perform regression analysis, it is convenient to use a model called LinearRegression in the linear_model module of scikit-learn.

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

#Here, we will generate recursive data.
X, y = make_regression(n_samples=100, n_features=1, n_targets=1, noise=5.0, random_state=42)

train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

model = LinearRegression()

model.fit(train_X, train_y)

#Output of coefficient of determination
print(model.score(test_X, test_y))

Linear multiple regression

Linear multiple regression means that one piece of data you want to predict (ex2. Restaurant's overall evaluation score) This is a regression analysis in which multiple data are used for prediction (ex2. Food deliciousness score and customer service goodness score). High prediction accuracy can be obtained when the relationships between the data used for prediction are weak.

Again, we use the least squares method to estimate the relationship between the predicted data and the data used for the prediction. In the case of multiple regression, the data used for prediction is x0x0, x1x1, x2x2 ...

y=β0x0+β1x1+β2x2+⋯+ϵy=β0x0+β1x1+β2x2+⋯+ϵ

We will estimate β0, β1, β2 ..., ϵ β0, β1, β2 ..., ϵ.

You can see that we have more x than simple regression.

Linear multiple regression also uses a model called LinearRegression in the linear_model module of scikit-learn. It is possible to perform regression analysis. Automatically best fits existing data β0, β1, β2 ..., ϵ β0, β1, β2 ..., ϵ are determined and predicted.

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

#Here n_features=Generate x by setting 10
#The number of x actually used is n_informative=Specify as 3
X, y = make_regression(n_samples=100, n_features=10, n_informative=3, n_targets=1, noise=5.0, random_state=42)
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

model = LinearRegression()
model.fit(train_X, train_y)
model.score(test_X, test_y)
#Also, model.predict(test_X)By writing test_You can make predictions for X.

Recommended Posts

Python: Supervised Learning (Regression)
Python: Application of supervised learning (regression)
Supervised learning (regression) 1 Basics
Python: Supervised Learning (Classification)
Python: Supervised Learning: Hyperparameters Part 1
python learning
Python: Supervised Learning: Hyperparameters Part 2
Supervised learning (regression) 2 Advanced edition
Supervised machine learning (classification / regression)
[Python] Learning Note 1
Python learning notes
python learning output
Python learning site
Python learning day 4
Python Deep Learning
Supervised learning (classification)
Deep learning × Python
python learning notes
Python class (Python learning memo ⑦)
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Python module (Python learning memo ④)
Reinforcement learning 1 Python installation
Learning Python with ChemTHEATER 05-1
Python: Deep Learning Practices
Python ~ Grammar speed learning ~
Python: Unsupervised Learning: Basics
Basics of Supervised Learning Part 1-Simple Regression- (Note)
Machine learning logistic regression
EV3 x Python Machine Learning Part 2 Linear Regression
Private Python learning procedure
Learning Python with ChemTHEATER 02
Machine learning linear regression
Learning Python with ChemTHEATER 01
[Python3] Let's analyze data using machine learning! (Regression)
Python: Deep Learning Tuning
Python + Unity Reinforcement Learning (Learning)
Basics of Supervised Learning Part 3-Multiple Regression (Implementation)-(Notes)-
Regression analysis in Python
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Multiple regression expressions in Python
Effective Python Learning Memorandum Day 15 [15/100]
Python exception handling (Python learning memo ⑥)
O'Reilly python3 Primer Learning Notes
Learning flow for Python beginners
Effective Python Learning Memorandum Day 12 [12/100]
Python learning plan for AI learning
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Reinforcement learning starting with Python
Machine learning with Python! Preparation
Supervised Learning 3 Hyperparameters and Tuning (2)
Understand machine learning ~ ridge regression ~.
Machine Learning: Supervised --Random Forest
Supervised learning 1 Basics of supervised learning (classification)
Effective Python Learning Memorandum Day 1 [1/100]
Python Machine Learning Programming> Keywords
Effective Python Learning Memorandum Day 13 [13/100]
Machine Learning: Supervised --Support Vector Machine
Simple regression analysis in Python