Machine learning summary by Python beginners

Introduction

As of November 2019, I am learning Python at PyQ, but I tend to have too many inputs, so I would like to post it as an output of what I learned.

There are some parts that are difficult to understand, so I think there are many parts that are difficult to understand, but I would like to do my best.

table of contents

  1. Logistic regression
  2. Decision tree
  3. Support vector machine

1. What is logistic regression?

The main subject is from here. In a word ... Divide the two data with a line!

Program basics

python


from sklearn.model_selection import train_test_split
#Divide the data for training and evaluation (testing)
X_train, X_test,y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=0)

#Extract the module containing logistic regression from sklearn
from sklearn.linear_model import LogisticRegression
# C=0.01, 0.1, 1, 10,Tried 100
lr = LogisticRegression(C=0.01, random_state=0)
scores = cross_val_score(lr, X, y, cv=10)

print("Correct answer rate", np.mean(scores), "standard deviation+/-", np.std(scores))

Program description

  1. Divide the data for training and testing
  2. Initialize the classifier
  3. Learn with training data --lr.fit (...): Learning with feature matrix and objective variables
  4. Score measurement with test data --lr.score (...): Returns score from 0 to 1 based on data lr.score returns the correct result as a percentage of the correct answer. Internally, lr.predict (X_test) is executed and the score is calculated by comparing with y_test.

Supplement

--C is specified as an argument of Logistic Regression. This C is a parameter for regularization. Regularization is the adjustment of model formulas to prevent overfitting. Overfitting is true for training data because the model is too complex, but not for test data. The strength of regularization increases as parameter C decreases.

――Cross_val_score is a verification method that divides data and repeats learning and judgment. You can further divide the divided training data and verify the performance so that the data is not biased. Since cv = 10, this time the average value obtained by analyzing the pattern 10 times is given.

Related words

Logistic function guuu.jpg

2. What is a decision tree?

In a word ... The analysis result is output by a series of if statements. The following is a quote from the Official Site. decision tree.jpg You can see that it has a tree-like structure. A decision tree is a machine learning method that uses a tree structure for classification and regression.

Program basics

python


#Extract a module with a decision tree from sklearn
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()
#Classify by decision tree
tree.fit(X, y)
#Score calculation with test data
tree.score(X, y)
#Plot the results
plot_tree(tree, X, y)

Related words

--Hyperparameters: Values that are specified in the learning model to specify the learning method, processing complexity, and calculation parameters. → Specify max_depth in the decision tree. This is a parameter that specifies the depth of the tree, and the smaller it is, the simpler the decision boundary. This reduces the accuracy of the validation data but reduces the risk of overfitting. --Overfitting: The data contains both information that is useful for estimation and information that is not useful for estimation (such as noise). Models with noise apply well to training data, but not so much to test data. --Ensemble learning: A general term for methods that try to create a stronger learning model by combining multiple algorithms. --Random Forest: A combination of several decision trees and a majority vote

3. What is a support vector machine?

In a word ... It can be used for classification and regression! With the same algorithm in Python, you can use two functions, classification and regression. SVC (Support Vector Classification): Classification SVR (Support Vector Regression): Regression

In a linear SVM, it is classified and regressed as shown in the figure below.

Feature dimension Boundary type
2D Straight line
3D Straight line
n dimensions Hyperplane

Program basics

python


#SVM SVC(Classification)choose
from sklearn.svm import SVC
#Modeling and training data training
svm = SVC(C=1.0, kernel='linear')
svm.fit(X_train, y_train)
#Score calculation with test data
svm.score(X_test, y_test)
#Plot of learned area
plot_regions(svm, X, y);

Call the classifier by the support vector machine with SVC () and store it once in a variable called clf. The inner parameters C and Kernel are the types of penalty contribution and support vector machines, respectively. Penalty contribution C is the magnitude of the penalty contribution when determining the boundary. The larger C, the greater the penalty for misrecognized points. "Kernel ='linear'" means to use a linear support vector machine.

Related words

Hard margin: A thing that is easy to get started with linear separation and can be linearly separated Soft Margin: SVM Wise Than Hard Margin

Reference URL

https://kenyu-life.com/2019/02/11/support_vector_machine/

https://qiita.com/t-yotsu/items/1a6bf3a4f3242eae7857

Recommended Posts

Machine learning summary by Python beginners
<For beginners> python library <For machine learning>
A beginner's summary of Python machine learning is super concise.
"Python Machine Learning Programming" Summary Note (Jupyter)
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Machine learning tutorial summary
Machine learning ⑤ AdaBoost Summary
Learning flow for Python beginners
Machine learning beginners tried RBM
Machine learning with Python! Preparation
Machine learning ② Naive Bayes Summary
Machine learning article summary (self-authored)
Python Machine Learning Programming> Keywords
Beginning with Python machine learning
4 [/] Four Arithmetic by Machine Learning
Machine learning ④ K-nearest neighbor Summary
Machine learning python code summary (updated from time to time)
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2
Summary of the basic flow of machine learning with Python
Python Summary
Python Machine Learning Programming Chapter 2 Classification Problems-Machine Learning Algorithm Training Summary
Machine learning ① SVM (Support Vector Machine) Summary
Machine learning beginners try linear regression
Python beginners publish web applications using machine learning [Part 1] Introduction
Python summary
Machine learning ③ Summary of decision tree
Machine learning
Typing automation notes by Python beginners
python learning
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Python: Preprocessing in Machine Learning: Overview
Interval scheduling learning memo ~ by python ~
"Scraping & machine learning with Python" Learning memo
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Python & Machine Learning Study Memo: Environment Preparation
scikit-learn How to use summary (machine learning)
Making Sandwichman's Tale by Machine Learning ver4
Answer to AtCoder Beginners Selection by Python3
[Learning memo] Basics of class by python
Amplify images for machine learning with python
First Steps for Machine Learning (AI) Beginners
Use machine learning APIs A3RT from Python
Machine learning with python (2) Simple regression analysis
I installed Python 3.5.1 to study machine learning
[python] Frequently used techniques in machine learning
Why Python is chosen for machine learning
Django tutorial summary for beginners by beginners ③ (View)
[Failure] Find Maki Horikita by machine learning
Four arithmetic operations by machine learning 6 [Commercial]
Python: Preprocessing in machine learning: Data acquisition
[Shakyo] Encounter with Python for machine learning
[Python] First data analysis / machine learning (Kaggle)
[Python] When an amateur starts machine learning
Machine learning algorithm classification and implementation summary
[Python] Web application design for machine learning
Python and machine learning environment construction (macOS)
An introduction to Python for machine learning