Machine learning starting with Python Personal memorandum Part2

Continuing from the previous session.

A reminder about the first half of Chapter 2 of the book Machine Learning Beginning with Python. I think this article is a mess unless you are reading "Machine learning starting with Python". Let's buy it and study together!

Supervised learning

When learning something, you must first learn the language used in that discipline or field before you can talk or discuss. As a beginner, I can't get into this road without first climbing this big wall little by little. It's a little complicated, but I'll explain the words I hear for the first time at the same time.

Classification and regression

Supervised machine learning can be roughly divided ・ Classification ・ Regression It can be classified into (this is a classification).

The classification of iris done in Chapter 1 and the judgment of whether it is SPAM mail are also classified.

On the other hand, regression refers to machine learning that performs continuous value prediction. For example ・ Corn yield ・ Annual income forecast ・ Fluctuation of stock price Or something.

Generalization, overfitting, underfitting

The goal of machine learning is generalize. If the prediction model created from the training data can accurately predict unknown data, the model is said to be generalized. I think it's okay if the model feels good.

Creating a model that gives more information than necessary is called overfitting. The opposite is called lack of conformity. If the amount of information is increased due to insufficient conformity, the accuracy will gradually increase, but if the amount of information exceeds a certain amount, it will become overfitted and begin to decrease. This is the model required by the model that gives the sweet spot (maximum value of accuracy) that shows the best generalization performance.

Supervised machine learning algorithm

Warning text
# Data set generation
X, y = mglearn.datasets.make_forge()

This time

DeprecationWarning: Function make_blobs is deprecated; Please import make_blobs directly from scikit-learn
  warnings.warn(msg, category=DeprecationWarning)

A warning message will appear, but I will ignore it for the time being as it will proceed. After this, when using mglearn, a lot of warning texts will appear, but Ignore them all.

University of Wisconsin

University of Wisconsin It is an excellent school that is listed in the World University Rankings and has produced Nobel laureates. If you read this book, you might be able to get along with the University of Wisconsin. "Ah, that Wisconsin guy" "Oh yeah, the one with breast cancer!" It's like that.

zip function

How to use Python and zip functions: Get multiple list elements at once See here for how to use the Zip function.

Feature engineering

The product between features is called an interaction.

k-nearest neighbor method

k-nearest neighbor classification

A method of referencing the k closest data and deciding the label by voting (majority vote). Since there is no weight due to distance from the context, it is possible that the label will be different from the closest one. The boundary of the class can be determined by changing the two features. This church is called the decision boundary. The larger the value of k, the smoother the decision boundary, and the smaller the value of k, the more complex the model can be. In the book, the number of k was changed to find the number of references with the highest accuracy.

k-nearest neighbor regression

k-Regression version of the nearest neighbor. This is a method of adopting the value of the closest one when k = 1 and the average value when it is 3 or 9. I think it's the simplest way anyone can think of.

Advantages and disadvantages

Advantages: Easy to understand, fairly high accuracy without adjustment Disadvantages: Larger training sets slow down, sparse datasets perform poorly For this reason, it is rarely used in practice.

Linear model

Linear regression

This is the least squares method. A method of adopting a parameter that minimizes the error when squared.

Ridge regression

Suddenly it became unclear. L2 regularization is not explained too much. .. .. Overfitting and L2 regularization If you check this page, it might have been easier to imagine. in short, "Inclination" w To optimize not only for predictions on training data, but also for other constraints (← I'm not sure here) I want to make w smaller than the fitting to the training data (results obtained by the least squares method). Therefore, a penalty of sum of squares is given to the size of w to make it smaller on purpose.

The smaller the penalty, the closer it is to a linear model, If the penalty is large, w approaches 0. In other words, if you want to generalize, you can increase the penalty, but how much you should increase depends on the model.

Yup. I don't know how to say it myself.

Lasso This is a penalty for the sum of absolute values, but at that time it seems that the coefficient tends to be 0. It was written that it was used when you wanted to reduce variables. I'm not sure, so I'll investigate it in detail and write an article.

Linear model for classification

The linear model when classifying classifies according to whether the value of the function is greater than or less than 0. Logistic regression is included in regression, but it seems to be a classification algorithm. The difference between linear SVC and logistic regression was not clear.

Multi-class classification by linear model

The one-to-other (one-vs.-rest) approach was explained. It is classified by classifying it into one class and the other, and doing it for all classes. I understand what you're saying, but after all, just using scikit-learn doesn't seem to deepen your understanding. .. .. I know how to use it, so I guess it's from where I can get used to.

Recommended Posts

Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
Reinforcement learning starting with Python
Machine learning with Python! Preparation
Beginning with Python machine learning
Machine learning with python (1) Overall classification
"Scraping & machine learning with Python" Learning memo
Predict power demand with machine learning Part 2
Amplify images for machine learning with python
Machine learning with python (2) Simple regression analysis
[Shakyo] Encounter with Python for machine learning
Data analysis starting with python (data preprocessing-machine learning)
Build AI / machine learning environment with Python
[Python] Easy introduction to machine learning with python (SVM)
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Learning Python with ChemTHEATER 05-1
Python memorandum (personal bookmark)
EV3 x Python Machine Learning Part 2 Linear Regression
Python starting with Windows 7
[Note] Python, when starting machine learning / deep learning [Links]
[Python] Collect images with Icrawler for machine learning [1000 images]
Learning Python with ChemTHEATER 02
Python basic memorandum part 2
Machine learning starting from scratch (machine learning learned with Kaggle)
GRPC starting with Python
Learning Python with ChemTHEATER 01
I started machine learning with Python Data preprocessing
Build a Python machine learning environment with a container
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Run a machine learning pipeline with Cloud Dataflow (Python)
Build a machine learning application development environment with Python
Summary of the basic flow of machine learning with Python
Effective Python Learning Memorandum Day 15 [15/100]
Image processing with Python (Part 2)
Machine learning learned with Pokemon
Studying Python with freeCodeCamp part1
Bordering images with python Part 1
Scraping with Selenium + Python Part 1
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Python: Supervised Learning: Hyperparameters Part 1
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Studying Python with freeCodeCamp part2
Image processing with Python (Part 1)
Solving Sudoku with Python (Part 2)
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Machine learning Minesweeper with PyTorch
Image processing with Python (Part 3)
Python Machine Learning Programming> Keywords
Python: Supervised Learning: Hyperparameters Part 2
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Scraping with Selenium + Python Part 2
Effective Python Learning Memorandum Day 4 [4/100]
Python Iteration Learning with Cheminformatics
Python starting with Hello world!
Try machine learning with Kaggle