First simple regression analysis in Python

The meaning of simple regression analysis can be found as many times as you like, but I hope that you can deepen your understanding by actually writing a program yourself, and I would like to try it using Python.

For the time being, there are the following examples as explanations for simple regression analysis.

  1. One objective variable (y) predicted by one explanatory variable (x).
  2. Express their relationship in the form of a linear equation y = ax + b.

The test environment uses a Jupyter Notebook (I don't even remember when I installed it). The version used is as follows.

The version of the notebook server is: 6.0.0
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0]

pandas pandas (pandas, pandas or pandas) is a library for converting and analyzing data. We will use this to read the data. The version used is as follows.

import pandas as pd
print(pd.__version__)
# 0.24.2

Data read

This time, I will use the height (x) and weight (y) data (sample.csv) for 48 people.

sample.csv


x,y
152,57
173,78
172,83
178,58
166,63
175,66
158,66
163,74
157,64
165,68
176,68
165,60
147,63
153,63
146,47
156,49
145,59
181,66
160,74
140,55
152,55
165,56
170,65
159,51
151,52
167,51
177,82
155,63
159,45
170,66
154,56
163,60
161,70
165,70
150,57
158,53
163,67
186,69
168,68
170,74
155,60
159,49
170,87
163,50
166,58
161,69
159,60
171,71

Read the sample.csv file and try to output the first 3 lines. It seems that it can be read as follows.

df = pd.read_csv('sample.csv')
df.head(3)
 	x 	y
0 	152 	57
1 	173 	78
2 	172 	83

When I read the data with pandas.read_csv, it seems that the data is created with a type called DataFrame. pandas.read_csv DataFrame

Store each column data in variables x and y.

x = df.x
y = df.y

matplotlib matplotlib is a graph drawing library. The version used is as follows.

import matplotlib
matplotlib.__version__
# '3.1.0'

Graph drawing

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

matplotlib1.PNG

A graph with all the points connected by a line was displayed. What I expected was a graph showing only points, so I'll modify it as follows.

import matplotlib.pyplot as plt
plt.plot(x, y, 'o')
plt.show()

matplotlib2.PNG

scikit-learn scikit-learn is a machine learning library built on the Python packages NumPy (Nampai or Nampai) and SciPy for performing scientific and technological calculations. The version used is as follows.

import sklearn
print(sklearn.__version__)
# 0.21.2

It seems that simple regression analysis can be easily performed by using scikit-learn. LinearRegression

Data learning

Instantiate a linear regression model (LinearRegression) and train (fit) the data.

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x, y)
# ValueError: Expected 2D array, got 1D array instead:

I thought, I got an error. Where a 2D array is needed, it seems to give a 1D array. Let's change the storage method of x and y and learn again.

x = df[['x']]
y = df[['y']]
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x, y)
# LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

It seems that he took it in this time. This part should work if you define the original x and y as follows.

#numpy in values.Convert to ndarray type and np.reshape(-1, 1)Convert to n rows and 1 column with
model.fit(x.values.reshape(-1,1), y.values.reshape(-1,1))

Data forecast

Let's predict it.

plt.plot(x, y, 'o')
plt.plot(x, model.predict(x), linestyle="solid")
plt.show()

matplotlib3.PNG As a result of predicting the objective variable (y) from the explanatory variable (x), a line of rising shoulders (height increases as the height increases) is drawn.

It seems that the coef_ and intercept_ attributes hold the "slope" and "intercept" of this straight line, respectively, so if you output them, you can get the equation of the straight line.

print('y = %.2fx + %.2f' % (model.coef_ , model.intercept_))
# y = 0.52x + -20.94

From the above, by knowing a (slope) and b (intercept), it is possible to predict y (weight) from x (height), that is, "simple regression analysis" has been realized. ..

Recommended Posts

First simple regression analysis in Python
Simple regression analysis in Python
Regression analysis in Python
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 1-2. Simple regression analysis (algorithm)
Simple regression analysis implementation in Keras
Machine learning with python (2) Simple regression analysis
Simple gRPC in Python
Association analysis in Python
Multiple regression expressions in Python
Axisymmetric stress analysis in Python
Online linear regression in Python
Simple IRC client in python
2. Multivariate analysis spelled out in Python 7-3. Decision tree [regression tree]
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Calculate the regression coefficient of simple regression analysis with python
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
EEG analysis in Python: Python MNE tutorial
Simple OAuth 2 in Python (urllib + oauthlib)
Machine learning algorithm (simple regression analysis)
Predictive Statistics (Practice Simple Regression) Python
Simple Regression Analysis in High School Mathematics-Verification of Moore's Law
Implementing a simple algorithm in Python 2
Planar skeleton analysis in Python (2) Hotfix
2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
Run a simple algorithm in Python
Logistic regression analysis Self-made with python
Simple gacha logic written in Python
The first step in Python Matplotlib
A simple data analysis of Bitcoin provided by CoinMetrics in Python
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
Generate a first class collection in Python
Linear regression in Python (statmodels, scikit-learn, PyMC3)
A simple HTTP client implemented in Python
Try drawing a simple animation in Python
Online Linear Regression in Python (Robust Estimate)
Create a simple GUI app in Python
I implemented Cousera's logistic regression in Python
[Python] First data analysis / machine learning (Kaggle)
MongoDB for the first time in Python
Write a simple greedy algorithm in Python
Write a simple Vim Plugin in Python 3
Residual analysis in Python (Supplement: Cochrane rules)
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
First time python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Data analysis python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python