Python Machine Learning Programming Chapter 1 Gives Computers the Ability to Learn from Data Summary
Introduction
--Machine learning
--Application and science of algorithms to understand the meaning of data
--Exciting fields in computer science
――This chapter deals with the main concepts of machine learning and their types.
--Contents to be handled
--General concept
--Three types of learning and basic terms
--Components for system design
--Python setup
--Sample code
- python-machine-learning-book/code/ch01/ch01.ipynb
――In the following summary, the code and formula are not described. I'm sorry.
1.1 "Intelligent machines" that turn data into knowledge
--Large amount of data
--Structured data
--Unstructured data
--Examples of application in daily life
--Email spam filter
--Character / voice recognition software
- Search engine
--Competition with Go players
1.2 3 types of machine learning
--Supervised learning
--Unsupervised learning
--Reinforcement learning
1.3 Future prediction by "supervised learning"
- Target
--Learning a model from training data so that you can predict unknown and future data
--Supervised data
--A set of samples for which the desired output signal is already known
--If there is an email spam filter, is it "spam" or "not spam"?
--Example
--Classification: Has a discrete value class label
--Regression: Output signal has continuous value
1.3.1 Classification for predicting class labels
- Purpose
--Predict class labels for new instances based on past observations
--Class labels are discrete and out-of-order values (affiliation)
--Binary classification
--Email spam filter
--Multi-class classification
--Handwritten character recognition
1.3.2 Regression for predicting continuous values
- Purpose
--Given multiple predictors and continuous response variables, explore the relationship between those variables so that you can predict the outcome.
--Linear regression
1.4 Reinforcement learning to solve dialogue problems
- Target
--Developing a system (agent) that improves performance based on interaction with the environment
--Can be regarded as a field related to supervised learning
--Information about the current state of the environment also includes reward signals
――This feedback is not a label or value of the correct answer, but quantifies the performance of the action measured by the "reward" function.
--Maximize rewards
--Trial and error approach
--Do not use model
--Cram school planning
--Use the model
--Example
--Chess engine
--Reward wins or loses
1.5 Discovering hidden structures through "unsupervised learning"
--Unsupervised learning
--Handling unlabeled data or data of unknown structure
1.5.1 Discovery of groups by clustering
--Clustering (unsupervised classification)
--Exploratory data analysis that can structure a large amount of information as a meaningful group
--Exploratory data analysis: Calculating data statistics and visualizing the distribution to exploratoryly derive knowledge about the data.
--Example
--Discovery of customer groups in marketing
1.5.2 Dimensionality reduction for data compression
-(Unsupervised) Dimensionality reduction
--Compress data into lower dimensional subspaces while preserving most of the relevant information
- Purpose
--I want to avoid processing a large number of values because the storage space and computing performance are limited.
--Data visualization
--Method example
--Unsupervised dimensionality reduction
--Principal component analysis
--Kernel principal component analysis
--Supervised dimensionality reduction
--Discriminant analysis
1.6 Basic terms and notation
- sample
- Feature value
--Target
- linear algebra
- vector
- queue
1.7 Roadmap for building a machine learning system
--General workflow when using machine learning for predictive modeling
- Pretreatment
- Learning
- Evaluation
- Forecast
1.8 Preprocessing: Data shaping
- Preprocessing
--Convert to the format required to optimize the performance of machine learning algorithms
--Make the selected features the same scale
--Convert features to a range of 0 and 1
--Converted to a standard normal distribution with mean 0 and variance 1.
--Depending on the extracted features, there is a high degree of correlation and constant duplication.
--Dimensionality reduction
--Check if it can be applied (generalized) to new data sets
--Split the dataset into a training dataset and a test dataset
1.8.1 Predictive model training and selection
--Comparison of several algorithms is essential to train and select a good model
--Indicator for measuring performance
--Correct answer rate
--Estimation of model generalization performance
--Split training dataset for training and validation, cross-validation
--Hyperparameter optimization
1.8.2 Model evaluation and unknown instance prediction
--Evaluation of generalization error
--Apply the model to the test dataset and check how well it will perform against unknown data
--The parameters of the above procedure such as feature scaling and dimensionality reduction are retrieved only in the training dataset.
1.9 Use Python for machine learning
1.9.1 Installation of Python package
- Numpy
--Multidimensional array
- Pandas
--Higher level data manipulation tools
- matplotlib
--Visualization of numerical data
- scikit-learn
--Machine learning
Reference book
-Python Machine Learning Programming
Thank you very much.