[PYTHON] Machine learning algorithm classification and implementation summary

Introduction

We have summarized the classification of machine learning and a simple implementation using a library of those algorithms. The code of each algorithm includes sample data, so it can be executed as it is. Since only the minimum required parameters are set, please refer to the official document etc. for detailed settings.

Each algorithm has a brief explanation, but no detailed explanation.

Target audience

--I want to know the classification of machine learning algorithms --I want to implement and operate a machine learning algorithm

goal

--Understanding the classification of machine learning algorithms --Can implement machine learning algorithms

Machine learning classification

Machine learning is categorized as follows.

--Supervised learning --Regression --Classification --Unsupervised learning --Reinforcement learning

image.png This time, we will not deal with the implementation of reinforcement learning.

Supervised learning

Supervised learning is a method of learning the answer to a problem from data that represent features (features, explanatory variables) and data that is the answer (labels, objective variables).

Supervised learning can be divided into the following two categories.

--Regression: Predict continuous numbers --Height prediction, etc. --Category: Predict unordered labels --Gender prediction, etc.

Usage data

I will briefly explain the data used in the implementation.

Here, we will use the sample data of scikit-learn. The data used are as follows for regression and classification.

--Regression -Boston Home Prices ――13 features --The objective variable is the house price --Classification -Wine Quality ――13 features --Classes to classify are 3

algorithm

We will introduce the algorithm of supervised learning. It describes whether each algorithm can be applied to regression or classification.

Linear regression

--Regression

A method of modeling the relationship in which the objective variable becomes larger (smaller) as the feature amount becomes larger.

Official documentation

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Logistic regression

--Classification

A method of learning the probability that an event will occur.

Official documentation

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Random forest

--Regression --Classification

A method of predicting by majority vote from multiple decision trees.

Regression version

Official Documentation

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = RandomForestRegressor(random_state=0)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Classification version

Official documentation

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = RandomForestClassifier(random_state=0)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Support vector machine

A technique for obtaining better decision boundaries by maximizing the margin.

--Regression --Classification

Regression version

Official documentation

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR

#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = SVR(kernel='linear', gamma='auto')
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Classification version

Official documentation

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = SVC(gamma='auto')
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

kNN

A method of memorizing all the training data and making a majority vote from k data that are close to the data you want to predict.

--Regression --Classification

Regression version

Official Documents

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor

#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = KNeighborsRegressor(n_neighbors=3)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

Classification version

Official Documents

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

score = model.score(X_test, y_test)

print('score is', score)

neural network

A method that imitates the neural circuits of the human brain, which has a structure consisting of an input layer, a hidden layer, and an output layer.

--Regression --Classification

Official documentation

Regression version

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from tensorflow.keras import models
from tensorflow.keras import layers

#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])

model.fit(X_train, y_train)

mse, mae = model.evaluate(X_test, y_test)

print('MSE is', mse)
print('MAE is', mae)

Classification version

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import utils

#Data reading
boston = load_wine()
X = boston['data']
y = utils.to_categorical(boston['target'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train)

crossentropy, acc = model.evaluate(X_test, y_test)

print('Categorical Crossentropy is', crossentropy)
print('Accuracy is', acc)

Gradient boosting

One of ensemble learning that trains multiple models. A method of repeatedly extracting some data and sequentially learning multiple decision tree models.

--Regression --Classification

Official documentation

Regression version

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import lightgbm as lgb
import numpy as np

#Data reading
wine = load_boston()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test)

params = {
    'objective': 'regression',
    'metric': 'mse',
}
num_round = 100

model = lgb.train(
    params,
    lgb_train,
    valid_sets=lgb_eval,
    num_boost_round=num_round,
)

y_pred = model.predict(X_test)

score = mean_squared_error(y_test, y_pred)

print('score is', score)

Classification version

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import lightgbm as lgb
import numpy as np

#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test)

params = {
    'objective': 'multiclass',
    'num_class': 3,
}
num_round = 100

model = lgb.train(
    params,
    lgb_train,
    valid_sets=lgb_eval,
    num_boost_round=num_round,
)

pred = model.predict(X_test)
y_pred = []
for p in pred:
    y_pred.append(np.argmax(p))

score = accuracy_score(y_test, y_pred)

print('score is', score)

Unsupervised learning

Unsupervised learning is a method of learning from only data that represents characteristics, without any data that can be answered.

Usage data

In the use of unsupervised learning, it is the quality data of the wine used in supervised learning.

-Wine Quality ――13 features --Classes to classify are 3

algorithm

We will introduce the algorithm of unsupervised learning.

K-means

One of the clustering methods. A method of organizing data into k classes.

Official documentation

from sklearn.datasets import load_wine
from sklearn.cluster import KMeans

#Data reading
wine = load_wine()
X = wine['data']

model = KMeans(n_clusters=3, random_state=0)
model.fit(X)

print("labels: \n", model.labels_)
print("cluster centers: \n", model.cluster_centers_)
print("predict result: \n", model.predict(X))

Mixed Gaussian distribution

One of the clustering methods. A method of classifying data according to which Gaussian distribution it belongs to, assuming that the data is generated from multiple Gaussian distributions.

Official Documents

from sklearn.datasets import load_wine
from sklearn.mixture import GaussianMixture

#Data reading
wine = load_wine()
X = wine['data']

model = GaussianMixture(n_components=4)
model.fit(X)

print("means: \n", model.means_)
print("predict result: \n", model.predict(X))

Principal component analysis

One of the dimension reduction methods. A method of expressing data from a large number of variables with fewer variables (main components) while preserving the characteristics of the data.

Official documentation

from sklearn.datasets import load_wine
from sklearn.decomposition import PCA

#Data reading
wine = load_wine()
X = wine['data']

model = PCA(n_components=4)
model.fit(X)

print('Before Transform:', X.shape[1])
print('After Transform:', model.transform(X).shape[1])

Summary

--Supervised learning can be divided into regression and classification --There are several types of unsupervised learning ――You can do machine learning with a little code if you try it --Since there are several parameters, refer to the official documentation as needed.

I haven't studied enough about unsupervised learning and reinforcement learning, so I will continue to study.

References

-[Mechanism of machine learning algorithm that can be understood by looking at it](https://www.amazon.co.jp/%E8%A6%8B%E3%81%A6%E8%A9%A6%E3% 81% 97% E3% 81% A6% E3% 82% 8F% E3% 81% 8B% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA% E3% 83% A0% E3% 81% AE% E4% BB% 95% E7% B5% 84% E3% 81% BF-% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E5% 9B% B3% E9% 91% 91-% E7% A7% 8B% E5% BA% AD-% E4% BC% B8% E4% B9% 9F / dp / 4798155659) -[Learning by running with Python! A new machine learning textbook](https://www.amazon.co.jp/Python%E3%81%A7%E5%8B%95%E3%81%8B%E3%81%97%E3%81%A6 % E5% AD% A6% E3% 81% B6% EF% BC% 81% E3% 81% 82% E3% 81% 9F% E3% 82% 89% E3% 81% 97% E3% 81% 84% E6 % A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E7% AC% AC2% E7% 89% 88-AI-TECHNOLOGY-% E4% BC% 8A% E8% 97% A4 / dp / 4798159913 / ref = pd_aw_sbs_14_49? _Encoding = UTF8 & pd_rd_i = 4798159113 & pd_rd_r = a02c3fb4-ab39 = HfV8H & pd_rd_wg = ub3Ib & pf_rd_p = 1893a417-ba87-4709-ab4f-0dece788c310 & pf_rd_r = 874Z4AYT8W8GD6QQK6TT & psc = 1 & refRID = 874Z4AYT8W8GD6QQK6TT) -[Deep Learning with Python and Keras](https://www.amazon.co.jp/Python%E3%81%A8Keras%E3%81%AB%E3%82%88%E3%82%8B%E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-Francois-Chollet / dp / 4839964262) -[Deep Learning Textbook Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF % 92% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82 % A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88-% E5% 85% AC% E5% BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88 / dp / 4798157554) -List of typical machine learning methods

Recommended Posts

Machine learning algorithm classification and implementation summary
Machine learning algorithm (implementation of multi-class classification)
Machine learning classification
Python Machine Learning Programming Chapter 2 Classification Problems-Machine Learning Algorithm Training Summary
Machine learning algorithm (linear regression summary & regularization)
Machine learning ⑤ AdaBoost Summary
Machine learning / classification related techniques
Ensemble learning summary! !! (With implementation)
Machine learning ② Naive Bayes Summary
Machine learning #k-nearest neighbor method and its implementation and various
Machine learning article summary (self-authored)
Machine learning algorithm (simple perceptron)
Machine learning and mathematical optimization
Supervised machine learning (classification / regression)
Machine learning algorithm (support vector machine)
Machine learning ④ K-nearest neighbor Summary
Machine learning algorithm (logistic regression)
<Course> Machine Learning Chapter 6: Algorithm 2 (k-means)
Explanation and implementation of ESIM algorithm
Significance of machine learning and mini-batch learning
Sorting algorithm and implementation in Python
Machine learning ① SVM (Support Vector Machine) Summary
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Machine learning algorithm (multiple regression analysis)
Organize machine learning and deep learning platforms
Machine learning algorithm (gradient descent method)
Machine learning
Summary of recommended APIs for artificial intelligence, machine learning, and AI
[Machine learning] OOB (Out-Of-Bag) and its ratio
scikit-learn How to use summary (machine learning)
Deep learning learned by implementation 2 (image classification)
Personal notes and links about machine learning ① (Machine learning)
Explanation and implementation of Decomposable Attention algorithm
Python and machine learning environment construction (macOS)
"OpenCV-Python Tutorials" and "Practical Machine Learning System"
EV3 x Pyrhon Machine Learning Part 3 Classification
[Machine learning] Summary and execution of model evaluation / indicators (w / Titanic dataset)
Summary of mathematical scope and learning resources required for machine learning and data science
Classification of guitar images by machine learning Part 1
Study machine learning and computer science. Resource list
Gaussian mixed model EM algorithm [statistical machine learning]
Python & Machine Learning Study Memo ⑤: Classification of irises
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
Dictionary learning algorithm
Machine learning algorithms (from two-class classification to multi-class classification)
Machine learning Training data division and learning / prediction / verification
Summary of evaluation functions used in machine learning
Supervised learning (classification)
[Memo] Machine learning
Classification of guitar images by machine learning Part 2
Machine Learning sample
Machine Learning Professional Series Round Reading Session Slide Summary
"Apache Flink" new machine learning interface and Flink-Python module
Machine learning python code summary (updated from time to time)
[Machine learning] Understanding SVM from both scikit-learn and mathematics
Easy machine learning with scikit-learn and flask ✕ Web app
Python learning memo for machine learning by Chainer Chapters 1 and 2
About testing in the implementation of machine learning models
Site summary to learn machine learning with English video
Machine learning engineer lawyer explains AI and rights story