Introduction

As of November 2019, I am learning Python at PyQ, but I tend to have too many inputs, so I would like to post it as an output of what I learned.

There are some parts that are difficult to understand, so I think there are many parts that are difficult to understand, but I would like to do my best.

Logistic regression
Decision tree
Support vector machine

1. What is logistic regression?

The main subject is from here. In a word ... Divide the two data with a line!

In detail, it is a method of regressing with a sigmoid curve (S-shaped curve) instead of a straight line. It is cool to say that it is logistic in katakana, but after all it is a line. The objective variable is binary (such as 1 or 0 or ○ or ×)
It seems that it is not a linear model because the error does not follow the normal distribution.
It seems that logistic curves can be handled by logistic equations, but this time, once you say logistic regression, remember that it is bisected by a straight line.
Linear non-separable data can also be separated by logistic regression by adding a 3D axis that takes x1 * x2 values. A technique called SVM (Support Vector Machine) handles such classifications automatically.

Program basics

`python`


from sklearn.model_selection import train_test_split
#Divide the data for training and evaluation (testing)
X_train, X_test,y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=0)

#Extract the module containing logistic regression from sklearn
from sklearn.linear_model import LogisticRegression
# C=0.01, 0.1, 1, 10,Tried 100
lr = LogisticRegression(C=0.01, random_state=0)
scores = cross_val_score(lr, X, y, cv=10)

print("Correct answer rate", np.mean(scores), "standard deviation+/-", np.std(scores))

Program description

Divide the data for training and testing
Initialize the classifier
Learn with training data --lr.fit (...): Learning with feature matrix and objective variables
Score measurement with test data --lr.score (...): Returns score from 0 to 1 based on data lr.score returns the correct result as a percentage of the correct answer. Internally, lr.predict (X_test) is executed and the score is calculated by comparing with y_test.

Supplement

--C is specified as an argument of Logistic Regression. This C is a parameter for regularization. Regularization is the adjustment of model formulas to prevent overfitting. Overfitting is true for training data because the model is too complex, but not for test data. The strength of regularization increases as parameter C decreases.

――Cross_val_score is a verification method that divides data and repeats learning and judgment. You can further divide the divided training data and verify the performance so that the data is not biased. Since cv = 10, this time the average value obtained by analyzing the pattern 10 times is given.

Related words

Logistic function

Logistic functions and sigmoid functions are often used interchangeably. Likelihood function: Probability of obtaining training set data with the correct hypothesis Maximum likelihood estimation method: A method for determining the parameters that maximize the likelihood function

2. What is a decision tree?

In a word ... The analysis result is output by a series of if statements. The following is a quote from the Official Site. decision tree.jpg You can see that it has a tree-like structure. A decision tree is a machine learning method that uses a tree structure for classification and regression.

Program basics

`python`


#Extract a module with a decision tree from sklearn
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()
#Classify by decision tree
tree.fit(X, y)
#Score calculation with test data
tree.score(X, y)
#Plot the results
plot_tree(tree, X, y)

Related words

--Hyperparameters: Values that are specified in the learning model to specify the learning method, processing complexity, and calculation parameters. → Specify max_depth in the decision tree. This is a parameter that specifies the depth of the tree, and the smaller it is, the simpler the decision boundary. This reduces the accuracy of the validation data but reduces the risk of overfitting. --Overfitting: The data contains both information that is useful for estimation and information that is not useful for estimation (such as noise). Models with noise apply well to training data, but not so much to test data. --Ensemble learning: A general term for methods that try to create a stronger learning model by combining multiple algorithms. --Random Forest: A combination of several decision trees and a majority vote

3. What is a support vector machine?

In a word ... It can be used for classification and regression! With the same algorithm in Python, you can use two functions, classification and regression. SVC (Support Vector Classification): Classification SVR (Support Vector Regression): Regression

In a linear SVM, it is classified and regressed as shown in the figure below.

Feature dimension	Boundary type
2D	Straight line
3D	Straight line
n dimensions	Hyperplane

I explained that logistic regression can somehow solve the problem of linear non-separation. SVM makes it easy to handle non-linear problems.

Program basics

`python`


#SVM SVC(Classification)choose
from sklearn.svm import SVC
#Modeling and training data training
svm = SVC(C=1.0, kernel='linear')
svm.fit(X_train, y_train)
#Score calculation with test data
svm.score(X_test, y_test)
#Plot of learned area
plot_regions(svm, X, y);

Call the classifier by the support vector machine with SVC () and store it once in a variable called clf. The inner parameters C and Kernel are the types of penalty contribution and support vector machines, respectively. Penalty contribution C is the magnitude of the penalty contribution when determining the boundary. The larger C, the greater the penalty for misrecognized points. "Kernel ='linear'" means to use a linear support vector machine.

Related words

Hard margin: A thing that is easy to get started with linear separation and can be linearly separated Soft Margin: SVM Wise Than Hard Margin

Reference URL

https://kenyu-life.com/2019/02/11/support_vector_machine/

https://qiita.com/t-yotsu/items/1a6bf3a4f3242eae7857

Machine learning summary by Python beginners

Introduction

table of contents

1. What is logistic regression?

Program basics

`python`

Program description

Supplement

Related words

2. What is a decision tree?

Program basics

`python`

Related words

3. What is a support vector machine?

Program basics

`python`

Related words

Reference URL