As of November 2019, I am learning Python at PyQ, but I tend to have too many inputs, so I would like to post it as an output of what I learned.
There are some parts that are difficult to understand, so I think there are many parts that are difficult to understand, but I would like to do my best.
The main subject is from here. In a word ... Divide the two data with a line!
In detail, it is a method of regressing with a sigmoid curve (S-shaped curve) instead of a straight line. It is cool to say that it is logistic in katakana, but after all it is a line. The objective variable is binary (such as 1 or 0 or ○ or ×)
It seems that it is not a linear model because the error does not follow the normal distribution.
It seems that logistic curves can be handled by logistic equations, but this time, once you say logistic regression, remember that it is bisected by a straight line.
Linear non-separable data can also be separated by logistic regression by adding a 3D axis that takes x1 * x2 values. A technique called SVM (Support Vector Machine) handles such classifications automatically.
python
from sklearn.model_selection import train_test_split
#Divide the data for training and evaluation (testing)
X_train, X_test,y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=0)
#Extract the module containing logistic regression from sklearn
from sklearn.linear_model import LogisticRegression
# C=0.01, 0.1, 1, 10,Tried 100
lr = LogisticRegression(C=0.01, random_state=0)
scores = cross_val_score(lr, X, y, cv=10)
print("Correct answer rate", np.mean(scores), "standard deviation+/-", np.std(scores))
--C is specified as an argument of Logistic Regression. This C is a parameter for regularization. Regularization is the adjustment of model formulas to prevent overfitting. Overfitting is true for training data because the model is too complex, but not for test data. The strength of regularization increases as parameter C decreases.
――Cross_val_score is a verification method that divides data and repeats learning and judgment. You can further divide the divided training data and verify the performance so that the data is not biased. Since cv = 10, this time the average value obtained by analyzing the pattern 10 times is given.
Logistic function
In a word ... The analysis result is output by a series of if statements. The following is a quote from the Official Site. You can see that it has a tree-like structure. A decision tree is a machine learning method that uses a tree structure for classification and regression.
python
#Extract a module with a decision tree from sklearn
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()
#Classify by decision tree
tree.fit(X, y)
#Score calculation with test data
tree.score(X, y)
#Plot the results
plot_tree(tree, X, y)
--Hyperparameters: Values that are specified in the learning model to specify the learning method, processing complexity, and calculation parameters. → Specify max_depth in the decision tree. This is a parameter that specifies the depth of the tree, and the smaller it is, the simpler the decision boundary. This reduces the accuracy of the validation data but reduces the risk of overfitting. --Overfitting: The data contains both information that is useful for estimation and information that is not useful for estimation (such as noise). Models with noise apply well to training data, but not so much to test data. --Ensemble learning: A general term for methods that try to create a stronger learning model by combining multiple algorithms. --Random Forest: A combination of several decision trees and a majority vote
In a word ... It can be used for classification and regression! With the same algorithm in Python, you can use two functions, classification and regression. SVC (Support Vector Classification): Classification SVR (Support Vector Regression): Regression
In a linear SVM, it is classified and regressed as shown in the figure below.
Feature dimension | Boundary type |
---|---|
2D | Straight line |
3D | Straight line |
n dimensions | Hyperplane |
python
#SVM SVC(Classification)choose
from sklearn.svm import SVC
#Modeling and training data training
svm = SVC(C=1.0, kernel='linear')
svm.fit(X_train, y_train)
#Score calculation with test data
svm.score(X_test, y_test)
#Plot of learned area
plot_regions(svm, X, y);
Call the classifier by the support vector machine with SVC () and store it once in a variable called clf. The inner parameters C and Kernel are the types of penalty contribution and support vector machines, respectively. Penalty contribution C is the magnitude of the penalty contribution when determining the boundary. The larger C, the greater the penalty for misrecognized points. "Kernel ='linear'" means to use a linear support vector machine.
Hard margin: A thing that is easy to get started with linear separation and can be linearly separated Soft Margin: SVM Wise Than Hard Margin
https://kenyu-life.com/2019/02/11/support_vector_machine/
https://qiita.com/t-yotsu/items/1a6bf3a4f3242eae7857
Recommended Posts