[PYTHON] I decided to do a simple regression analysis manually-1-

Introduction

In order to deepen the understanding of the theory of regression analysis, let's manually create an analytical model without using the sklearn library, which is a powerful weapon of regression analysis.

What is regression analysis?

Predict the value of the objective variable (output data) using the explanatory variable (input data). This time, in order to obtain it theoretically, let us consider the case of one simple explanatory variable. (Simple regression)

Preparation in Python

▼ Prepare data

import numpy as np
import pandas as pd
from pandas import DataFrame

data_age = np.array([20,20,28,38,33,34,22,37,
                     26,21,22,39,31,29,38,35,
                     32,27,30])

data_salary = np.array([410,500,480,710,630,600,430,
                        690,500,410,490,800,550,550,
                        700,700,650,540,600])

data = DataFrame({'age':data_age,
                  'income':data_salary})

Data relationship

The following graph can be obtained from the above data set.

image.png

From this graph, I will try to express the relationship between age and income with a linear formula. (Forcibly bring it to a linear expression, but in reality it becomes a more complicated expression.)

For the time being, let's assume that the predicted age is x and the income is y, and we consider expressing `` `y = ax + b. Note that the values of a and b cannot be simply determined because there are multiple data. To calculate the most valid values for a and b, we use the idea of mean squared error. Specifically, for each data, take the difference between the income y that predicts (regresses) y = ax + b. y - 410= 20a + b - 410 y - 500= 20a + b - 500 y - 480= 28a + b - 480``` ... Transforms with. The square of the difference between the values of the actual data and the predicted data is added by the prepared data (N), and the average value is the average square error Q (a, b).

Q(a,b) = \frac{1}{N}\sum_{k=0}^{n-1}(ax_k + b - y_k)

This y_k is the income as actual data. Let us try to linearize the relationship between age and income by finding a and b that minimize the average squared error Q (a, b).

Average squared error graph

First, in order to find a and b that minimize Q (a, b), enter various values in a and b and write an outline of the average squared error.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

#Any a,prepare b
a = np.linspace(-200,200,100)
b = np.linspace(-500,500,100)

#Easy to make a combination of a and b
#Extend each to a two-dimensional array(Needed to draw a curved surface)
A,B = np.meshgrid(a,b)

#Function for calculating Q
def calc_Q(x,y,a,b):
    result = (a * x + b - y)**2
    return np.mean(result)

#Array for Q(Initialize with 0)
Q = np.zeros([len(a),len(b)])

#a,Calculate Q for all combinations of b
for j in range(100):
    for k in range(100):
        Q[j,k] = calc_Q(data_age,data_salary,a[j],b[k])

#Write an outline of a 3D graph
fig = plt.figure(figsize=[10,10])
ax = fig.add_subplot(111,projection="3d")
ax.view_init(45,10)
ax.set_xlabel("a",size=14,color="blue")
ax.set_ylabel("b",size=14,color="blue")
ax.set_zlabel("Q",size=14,color="blue")
ax.plot_surface(A,B,Q,color="red")
plt.show()

image.png

From the graph, it can be seen that the value of Q [a, b] is the minimum near a = 50 to 200, b = 0. From this graph, it can be seen that there is probably one minimum value. Therefore, using the re-sudden descent method (calculation method of the minimum value using the slope), try to find the values of a and b when the value of Q [a, b] becomes the minimum.

Continue.

reference

Introduction to Python Numerical Calculation https://python.atelierkobato.com/mse/

Recommended Posts

I decided to do a simple regression analysis manually-1-
I tried to create a simple credit score by logistic regression.
I made a python library to do rolling rank
Simple regression analysis in Python
[Introduction to Data Scientists] Descriptive Statistics and Simple Regression Analysis ♬
I tried to make a simple text editor using PyQt
First simple regression analysis in Python
MacBookPro Setup After all I want to do a clean installation
Machine learning algorithm (simple regression analysis)
I want to do a full text search with elasticsearch + python
I want to do ○○ with Pandas
Simple regression analysis implementation in Keras
I tried to perform a cluster analysis of customers using purchasing data
I want to do a monkey patch only partially safely in Python
I tried to create a linebot (implementation)
Want to solve a simple classification problem?
I tried multiple regression analysis with polynomial regression
I want to print in a comprehension
I tried to create a linebot (preparation)
Machine learning with python (2) Simple regression analysis
I want to build a Python environment
A simple IDAPython script to name a function
I made a script to display emoji
I tried to make a Web API
I made a simple blackjack with Python
I tried to make a simple mail sending application with tkinter of Python
[Patent analysis] I tried to make a patent map with Python without spending money
I tried to explain multiple regression analysis as easily as possible using concrete examples.
I want to do machine learning even without a server --Time Series Edition -
I tried to make a simple image recognition API with Fast API and Tensorflow
I want to make matplotlib a dark theme
I tried to build a super-resolution method / ESPCN
I want to do Dunnett's test in Python
A simple example of how to use ArgumentParser
I want to easily create a Noise Model
I want to INSERT a DataFrame into MSSQL
[Python] What I did to do Unit Test
I added a function to CPython (ternary operator)
I want to create a window in Python
I want to make a game with Python
I don't want to take a coding test
I made a simple Bitcoin wallet with pycoin
I made a tool to compile Hy natively
Sample to draw a simple clock using ebiten
I read "How to make a hacking lab"
I wrote a script to upload a WordPress plugin
I want to do pyenv + pipenv on Windows
I tried to generate a random character string
I tried to build a super-resolution method / SRCNN ③
I tried to build a super-resolution method / SRCNN ②
I made a tool to get new articles
I want to easily find a delicious restaurant
I want to write to a file with Python
LeetCode I tried to summarize the simple ones
I made a simple RSS reader ~ C edition ~
I tried to make a ○ ✕ game using TensorFlow
I want to upload a Django app to heroku
Introduction to Simple Regression Analysis with Python (Comparison of 6 Libraries of Numerical Calculation/Computer Algebra System)
[Mac] I want to make a simple HTTP server that runs CGI with Python
I want to save a file with "Do not compress images in file" set in OpenPyXL
A beginner tried coloring line art with chainer. I was able to do it.