[PYTHON] What is Logistic Regression Analysis?

What is Logistic Regression Analysis?

--A model that makes predictions by ** calculating the probability ** from several explanatory variables.

-** A type of generalized linear model **.

--Although it has the name "regression", it is often used for ** "classification" **.

What is a generalized linear model?

-** A linear model that can be used even when the response variable ** follows a probability distribution other than the normal distribution.

For example ** 〇 Weight = β0 + β1 × Height ** (Weight is a variable that follows a normal distribution)

** ✖ Clothing size = β + β1 × Height ** (Clothing size is clearly not a variable that follows a normal distribution)

The response variable must correspond to the linear predictor

What if the response variable is the number of ice cream sold?

Number of ice cream sold = β0 + β1 × Temperature ** (Response variable) (Linear predictor) **

The "number of ice cream sold" can only be positive, but the right side may be negative depending on the temperature.

**Therefore! !! ** ** Introduce a ** link function (log function) ** that will be the savior.

** log (number of ice cream sold) ** = β0 + β1 × Temperature

What if the response variable is probability (pass rate)?

** ✖ Test pass / fail (1,0) = β0 + β1 × Study time ** The right-hand side is clearly not an expression that takes only 1 or 0 values.

** ✖ Test pass rate = β0 + β1 × Study time ** However, this is still insufficient. The pass rate should range from 0 to 1, but not on the right side.

Therefore!! Introduce a ** link function (logit function) ** that will be the savior.

** log (p / 1-p) = β0 + β1 × Study time ** If this is made into the form of p = 〇,

** p = 1 / {1 + exp (-(β0 + β1 × study time))} ** By using this formula, the right side will take a range of 0 to 1.

The goal is to optimize ** parameters β0 and β1 of this equation.

データデータ

How do you define "optimal"?

Consider the ** likelihood function **. The predicted value of the nth person is

データ

【solution】 ① Eliminate multiplication by taking ** logarithm **. (Can be added) (2) By adding ** minus **, you can execute the ** gradient descent method **. (Because the gradient descent method is suitable for finding the minimum value) データ

The above equation is called the ** cross entropy error function **.

The optimum value of the parameter is obtained by differentiating ** β0 and β1 ** using this function ** gradient descent method **!

Experiment

** This time I would like to analyze using the dataset of the sklearn library. ** **

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

X=iris.data[50:,2].reshape(-1,1) #target 0~1 out of 2,Get only 2.
y=iris.target[50:]

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split


scaler=StandardScaler()#Standardization
X_scaled=scaler.fit_transform(X)

X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,random_state=0)

log_reg=LogisticRegression().fit(X_train,y_train)

print(model.coef_) #Regression variable display
print(model.intercept_) #Intercept of regression line

print(log_reg.score(X_train,y_train)) #Output the coefficient of determination.
print(log_reg.score(X_test,y_test)) #Output the coefficient of determination.
データ

Recommended Posts

What is Logistic Regression Analysis?
What is Multinomial Logistic Regression Analysis?
Logistic regression
Logistic regression
What is Reduced Rank Ridge Regression?
Logistic regression analysis Self-made with python
What is namespace
What is copy.copy ()
What is Django? .. ..
What is dotenv?
What is POSIX?
What is Linux
What is klass?
Poisson regression analysis
What is SALOME?
What is Linux?
What is python
What is hyperopt?
Regression analysis method
What is Linux
What is pyvenv
What is __call__
What is Linux
What is Python
What is a distribution?
What is Piotroski's F-Score?
What is Raspberry Pi?
[Python] What is Pipeline ...
Machine learning logistic regression
What is Calmar Ratio?
What is a terminal?
[PyTorch Tutorial ①] What is PyTorch?
What is hyperparameter tuning?
What is a hacker?
What is JSON? .. [Note]
What is Linux for?
What is a pointer?
What is ensemble learning?
What is TCP / IP?
Basics of regression analysis
What is Python's __init__.py?
Regression analysis with NumPy
What is an iterator?
What is UNIT-V Linux?
[Python] What is virtualenv
Regression analysis in Python
What is machine learning?
What is Minisum or Minimax?
What is Linux? [Command list]
What is the activation function?
Multiple regression analysis with Keras
What is the Linux kernel?
What is an instance variable?
What is a decision tree?
Time-series analysis Implementation is stuck-Notes-
What is a Context Switch?
What is Google Cloud Dataflow?
[DL] What is weight decay?
[Python] Python and security-① What is Python?
What is a super user?
Competitive programming is what (bonus)