[PYTHON] [Reading Notes] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Chapter 1

The Machine Learning Landscape

Why Use Machine Learning? Programming a spam filter is the next step.

  1. Notice that spam emails use some language, there are patterns in the sender's name and email composition.
  2. Write a program that detects such patterns, and if a certain number of patterns are detected, flag it as spam mail.
  3. Run the program and repeat steps 1 and 2.

Spam detection programs are difficult to maintain because they contain many rules, but when machine learning is used to create spam filters, the program is easy to keep short. And more accurate.

When writing a program, spammers can use different words to circumvent the spam filter if they know which word is used to detect spam. When writing a program, you have to add new rules, but for machine learning-based spam filters, it can be done automatically.

Types of Machine Learning Systems

The distinction between supervised and unsupervised learning, the distinction between batch learning and offline learning, and the distinction between instance-based and model-based learning are mentioned.

Supervised/Unsupervised Learning -In supervised learning, the training data includes labels. • A typical supervised learning task is classification, for example spam filters. -Another task is forecasting, for example, forecasting the price of a car.

Method ・ K-nearest neighbor method ・ Linear regression ・ Logistic regression ・ Support vector machine ・ Decision tree and random forest ·neural network

Unsupervised learning • In unsupervised learning, the training data does not include labels. ・ Clustering can be used to find out what kind of group the people who visit the blog have.

Method clustering ・ K-means method ・ DBSCAN ・ Hierarchical clustering

-Visualization aims to plot data in 2D or 3D space, and dimensional deletion aims to simplify the data without reducing a lot of information.

Technique visualization and dimension removal ・ Principal Component Analysis (PCA) ・ Kernel PCA ・ Locally-Linear Embedding (LLE) ・ T-distributed Stochastic Neighbor Embedding (t-SNE)

Anomaly detection is, for example, detecting unusual use of a credit card, and is similar to novelty detection.

Method Anomaly detection and novelty detection ・ One-Class SVM ・ Isolation Forest

・ Association rule learning aims to find interesting relationships from a large amount of data. For example, people who buy barbecue sauce and potato chips tend to buy steak.

Method Association Rule Learning ・ A priori ・ Eclat

Semisupervised learning -A algorithm that handles data with a few labels and data without many labels is called semi-supervised learning. For example, there is a determination as to whether or not the face is the same as the face in another photo in one photo.

Reinforcement Learning Reinforcement learning involves agents observing the environment, choosing actions, performing actions, and receiving rewards to learn policies (definition of how to behave in a given situation). For example, when a robot learns how to walk, it is used in AlphaGO.

Batch and Online Learning Batch learning ・ Those who cannot relearn are called offline learning.

Online learning -In online learning, the system can be trained on a regular basis by giving data in order. Online learning is used when the dataset is large. ・ The name online learning is misleading, so it is better to think of it as incremental learning. -The problem is that if the system is given bad data, the performance of the system will decrease. Therefore, it is necessary to monitor the system.

Instance-Based Versus Model-Based Learning Instance-based learning ・ Instant-based learning is the generalization of new examples by comparing learned examples with new ones based on a measure of similarity. For example, spam filtering. (It seemed like unsupervised learning, but it also includes K-nearest neighbors)

Model-based learning -Using a model to make predictions by constructing a model of data is called model-based learning. For example, do a regression analysis of each country's life satisfaction and GDP based on the hypothesis that money makes people happy.

Main Challenges of Machine Learning Important points in machine learning ・ Infants can recognize "apples" simply by pointing their fingers and saying "apples," but machine learning does not. 1000, 1,000,000 data is required. -If the sampling is bad even for a very large data set, it is not typical data. -If the training data is flooded with errors, outliers, and noise, it will be difficult for the system to detect the pattern, so preprocessing is required. ・ Feature engineering is important ・ Overfitting is dangerous, normalization is done by simplifying the model to prevent overfitting, and hyperparameters are adjusted. ・ Underfitting is the opposite of overfitting. Caused by the model being too simple.

Testing and Validating • Split the data, 80% for training and 20% for testing, to see if the model is working. • Train and compare both models to find out what to do if you are at a loss between two models (eg linear regression or K-nearest neighbors). ・ How should we regularize hyperparameters? There is a way to say that 100 models use 100 different hyperparameters. ・ Holdout verification

Data Mismatch -Although data can be easily obtained, it may not represent the data used in production.

End-to-End Machine Learning Project

Published dataset — UC Irvine Machine Learning Repository — Kaggle datasets — Amazon’s AWS datasets • Meta portals (they list open data repositories): — http://dataportals.org/ — http://opendatamonitor.eu/ — http://quandl.com/ • Other pages listing many popular open data repositories: — Wikipedia’s list of Machine Learning datasets — Quora.com question — Datasets subreddit

Evaluation function of regression analysis ・ Square root of mean square error (RMSE) ・ Mean squared error (RMSE)

Data snooping bias If you overlook the characteristics of your data, you will come up with overfitting hypotheses, so it's best to decide which algorithm to use in moderation.

Recommended Posts

[Reading Notes] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Chapter 1
[Memo / Creating] "Hands-On Machine Learning with Scikit-Learn & TensorFlow" English translation into Japanese
Practical machine learning with Scikit-Learn and TensorFlow-TensorFlow gave up-
Build a machine learning scikit-learn environment with VirtualBox and Ubuntu
Try machine learning with scikit-learn SVM
Personal notes and links about machine learning ① (Machine learning)
TensorFlow Machine Learning Cookbook Chapter 3 Personally Clogged
TensorFlow Machine Learning Cookbook Chapter 6 (or rather, tic-tac-toe)
Machine Learning with docker (40) with anaconda (40) "Hands-On Data Science and Python Machine Learning" By Frank Kane
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib
Machine learning with Raspberry Pi 4 and Coral USB Accelerator
[Machine learning] Understanding SVM from both scikit-learn and mathematics
Deep learning image analysis starting with Kaggle and Keras
Reading data with TensorFlow
Machine learning (TensorFlow) + Lotto 6
Record of the first machine learning challenge with Keras
I tried to implement Grad-CAM with keras and tensorflow
Learn Wasserstein GAN with Keras model and TensorFlow optimization
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
[Machine learning] Start Spark with iPython Notebook and try MLlib
[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
[Machine learning] Understanding decision trees from both scikit-learn and mathematics
For those who want to start machine learning with TensorFlow2
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
[Machine learning] Understanding logistic regression from both scikit-learn and mathematics
Machine learning learned with Pokemon
Try deep learning with TensorFlow
Basics of Machine Learning (Notes)
Auto Encodder notes with Keras
Machine learning with Python! Preparation
Machine learning Minesweeper with PyTorch
Machine learning and mathematical optimization
Beginning with Python machine learning
Try machine learning with Kaggle
I touched Tensorflow and keras
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8
[Machine learning] Try running Spark MLlib with Python and make recommendations
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7
Deep Learning with Shogi AI on Mac and Google Colab Chapter 10 6-9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 10
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 5-7
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 1-2
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
[Machine learning] Understanding linear multiple regression from both scikit-learn and mathematics
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
"Gaussian process and machine learning" Gaussian process regression implemented only with Python numpy
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3 ~ 5
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8 5-9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8 1-4
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 8
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 1-4
<Course> Machine Learning Chapter 6: Algorithm 2 (k-means)
MNIST (DCNN) with Keras (TensorFlow backend)
[Language processing 100 knocks 2020] Chapter 6: Machine learning
Image segmentation with scikit-image and scikit-learn
I tried machine learning with liblinear