[PYTHON] [Machine learning] Supervised learning using kernel density estimation Part 2

Supervised learning with kernel density estimation

This article is written by beginners in machine learning. Please note.

The previous article is here. The next article is here.

Relationship between kernel density estimation and supervised learning

I associated it without permission. For more information (not very detailed), please refer to the previous article (https://qiita.com/sorax/items/8663906fae41798a00b8). The simple summary is "I tried using kernel density estimation as a classifier for supervised learning!".

Object-orientation

I modified the script summarized in Previous article to make it object-oriented. The name is "Gaussian kernel-density estimate classifier", or "GKDE Classifier" for short. I just named it arbitrarily.

↓ Script ↓

import numpy as np

class GKDEClassifier(object):
    
    def __init__(self, bw_method="scotts_factor", weights="None"):
 # Kernel bandwidth
        self.bw_method = bw_method
 # Kernel weight
        self.weights = weights
        
    def fit(self, X, y):
 Number of labels for # y
        self.y_num = len(np.unique(y))
 # List containing estimated probability density functions
        self.kernel_ = []
 # Store probability density function
        for i in range(self.y_num):
            kernel = gaussian_kde(X[y==i].T)
            self.kernel_.append(kernel)
        return self

    def predict(self, X):
 # List to store predictive labels
        pred = []
 #List of test data label-specific probabilities
        self.p_ = []
 # Store probabilities by label
        for i in range(self.y_num):
            self.p_.append(self.kernel_[i].evaluate(X.T).tolist())
 # ndarray
        self.p_ = np.array(self.p_)
 # Prediction label allocation
        for j in range(self.p_.shape[1]):
            pred.append(np.argmax(self.p_.T[j]))
        return pred

Labels should be assigned in the order 0, 1, 2 ... (in ascending order of non-negative integers). Maybe: LabelEncoder

(Added on 2020/8/5: Part 3 has released the modified code)

\ _ \ _ Init \ _ \ _ method

Initializes the object. Here, specify the parameters required for kernel density estimation, that is, the arguments required for initializing SciPy's gaussian_kde. This time, I set the same value as the default value of gaussian_kde.

fit method

Learning is performed using teacher data. After estimating the kernel density with gaussian_kde, the estimated density function of label 0, the estimated density function of label 1, and so on are stored in order.

predict method

Predict test data.

for i in range(self.y_num):
            self.p_.append(self.kernel_[i].evaluate(X.T).tolist())

Here, the estimated density functions are extracted one by one from kernel_ and the probability density of the test data is calculated.

Subsequent scripts are messed up. I wanted to write it more concisely, but it didn't behave as I expected ... There is a coding beginner. Just move. It's better just to move.

That's why the object-oriented Gaussian kernel density estimation classifier is complete.

wine dataset

Combination with PCA

The wine dataset has 13 features, but after standardization, it will be reduced to 4 dimensions. Let's learn and classify with the data after dimensionality reduction.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Data set loading
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1, stratify=y)

# Standardization
sc = StandardScaler()
sc = sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

# Dimensionality reduction
pca = PCA(n_components=4)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

# Learning and prediction
f = GKDEClassifier()
f.fit(X_train_pca, y_train)
y_pred = f.predict(X_test_pca)

Result is……?

from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, y_pred))
 0.9722222222222222

Hooray. Since there are 36 test data, the correct answer rate is 35/36. It's pretty good.

No dimensionality reduction

What will happen?

# Learning and prediction
f = GKDEClassifier()
f.fit(X_train_std, y_train)
y_pred = f.predict(X_test_std)

print(accuracy_score(y_test, y_pred))
 0.9722222222222222

Result: Same.

Circular data set

I made a circular dataset.

from sklearn.datasets import make_circles
from matplotlib import pyplot as plt

X, y = make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.2)
plt.scatter(X[y==0, 0], X[y==0, 1], c="red", marker="^", alpha=0.5)
plt.scatter(X[y==1, 0], X[y==1, 1], c="blue", marker="o", alpha=0.5)
plt.show()

gkde_c.png

The label is different at the center and the outer edge. Can you classify it correctly?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=1, stratify=y)

f = GKDEClassifier()
f.fit(X_train, y_train)
y_pred = f.predict(X_test)

print(accuracy_score(y_test, y_pred))
 0.9933333333333333

Conclusion: Great victory.

Finally

I've categorized it so well, but I've forgotten what's important. That is the academic correctness of this classification method. Next time I will discuss it.

Continued to Part 3

Recommended Posts

[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
[Machine learning] Supervised learning using kernel density estimation
Machine Learning: Supervised --AdaBoost
Machine Learning: Supervised --Linear Regression
Python: Supervised Learning: Hyperparameters Part 1
Kernel density estimation in Python
Machine Learning: Supervised --Random Forest
Python: Supervised Learning: Hyperparameters Part 2
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
Machine Learning: Supervised --Decision Tree
Python beginners publish web applications using machine learning [Part 1] Introduction
Machine Learning: Supervised --Linear Discriminant Analysis
Application development using Azure Machine Learning
Predict power demand with machine learning Part 2
[Machine learning] LDA topic classification using scikit-learn
Machine learning
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Stock price forecast using machine learning (regression)
[Machine learning] Regression analysis using scikit learn
EV3 x Pyrhon Machine Learning Part 3 Classification
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
Machine learning memo of a fledgling engineer Part 1
Classification of guitar images by machine learning Part 1
A story about simple machine learning using TensorFlow
Machine learning starting with Python Personal memorandum Part2
Data supply tricks using deques in machine learning
Basics of Supervised Learning Part 1-Simple Regression- (Note)
Machine learning starting with Python Personal memorandum Part1
EV3 x Pyrhon Machine Learning Part 1 Environment Construction
EV3 x Python Machine Learning Part 2 Linear Regression
Face image dataset sorting using machine learning model (# 3)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
[Python3] Let's analyze data using machine learning! (Regression)
Machine learning memo of a fledgling engineer Part 2
Reasonable price estimation of Mercari by machine learning
Classification of guitar images by machine learning Part 2
Try using Jupyter Notebook of Azure Machine Learning
Basics of Supervised Learning Part 3-Multiple Regression (Implementation)-(Notes)-
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
Report_Deep Learning (Part 2)
Report_Deep Learning (Part 1)
Report_Deep Learning (Part 1)
Report_Deep Learning (Part 2)
Supervised learning (classification)
[Memo] Machine learning
Machine learning classification
Machine Learning sample
What I learned about AI / machine learning using Python (1)
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Create machine learning projects at explosive speed using templates
What I learned about AI / machine learning using Python (3)
Machine Learning Amateur Marketers Challenge Kaggle's House Prices (Part 1)
Python: Diagram of 2D data distribution (kernel density estimation)
Machine Learning with Caffe -1-Category images using reference model
Tech-Circle Let's start application development using machine learning (self-study)
[Machine learning] Try to detect objects using Selective Search
[Machine learning] Text classification using Transformer model (Attention-based classifier)
Memo for building a machine learning environment using Python