[PYTHON] [Machine learning] Supervised learning using kernel density estimation Part 3

Supervised learning with kernel density estimation

This article is written by beginners in machine learning. Please note.

The first article is here. The second article is here.

In this article, I will explain the background of the idea using mathematical formulas.

Probability density and probability

Kernel density estimation was to estimate the probability density function with the kernel function. So what exactly is the probability density function?

The probability density function represents the *** distribution of accessibility ***. *** For event A, a value x with a high probability density *** means that when event A occurs, the probability that the value at that time is x is relatively high ***. Let's replace "event A" with "label 0". "At value x, the probability density of label 0 is high" means "when some data has label 0, the probability that the data has value x is relatively high".

It should be noted here that probability density ≠ probability. Probability density is "relative ease of appearance for a specific event", and there is no guarantee that it can be compared with the probability density of other events. In the first place, the probability that a specific value will appear in continuous data is not defined. Probability cannot be calculated without a certain range.

P(X=x)=0 \\ P(X \leqq x)=p

By integrating the probability density function, you can find the probability in that interval (or region if it is two or more dimensions). So the probability of a particular value "near" can be calculated, probably. The higher the probability density of the value x, the higher the probability that the value x "near" will appear, and the lower the probability density of the value x, the lower the probability that the value x "near" will appear.

I would like to give you an example. At the value x, the probability density of label 1 is higher than the probability density of label 0. At this time,

--Probability that the value x "near" appears when the label is 0 --Probability that the value x "near" appears when the label is 1.

Which is higher? It's not strict, but intuitively the latter seems to be higher (miscellaneous). We will continue to discuss this intuition as correct.

Let's rewrite "label 0" as "y = 0" and "label 1" as "y = 1". Then, the story so far

 Probability density of label 0 at value x \ leqq Probability density of label 1 at value x \\
 \ Rightarrow Probability of getting value x when y = 0 \ leqq Probability of getting value x when y = 1 \\
 \Rightarrow P(x|y=0) \leqq P(x|y=1)

It is summarized as. However, it is exactly the value x "near".

Conditional probabilities and Bayes' theorem

For the value x

 P(y=0|x) \leqq P(y=1|x)

If, then it makes sense to assign label 1 instead of label 0. Therefore, if you can find the probability of "label 0 or label 1" and the conditional probability under the condition of "extracting the value x", you win.

Let's rewrite this inequality using Bayes' theorem.

 P(y=0|x) \leqq P(y=1|x) \\ \Leftrightarrow
 \frac{P(x|y=0)P(y=0)}{P(x)} \leqq \frac{P(x|y=1)P(y=1)}{P(x)}

The denominator P (x) is common and is greater than or equal to 0. After all

 P(x|y=0)P(y=0) \leqq P(x|y=1)P(y=1)

If you can show

 P(y=0|x) \leqq P(y=1|x)

Can be said to hold. Now using kernel density estimation

 P(x|y=0) \leqq P(x|y=1)

I know that. Therefore, if we can know the values of P (y = 0) and P (y = 1), it will be settled.

Label ratio estimation

P (y = 0) can be interpreted as "the probability that some data is selected and it is labeled 0".

Finding the population P (y = 0) and P (y = 1) is not easy. Therefore, let the composition ratio of the label of the teacher data be the estimated value of P (y = 0) and P (y = 1). Of 100 teacher data, 40 are label 0 and 60 are label 1.

P(y=0)=0.4\\ P(y=1)=0.6

I presume.

In fact, the classifiers I implemented in previous articles ignored the effects of P (y = 0) and P (y = 1). He made a strong assumption that the ratio of each label was equal.

Reimplementation

Based on the explanation so far, we will implement the object-oriented classifier again.

import numpy as np

class GKDEClassifier(object):
    
    def __init__(self, bw_method="scotts_factor", weights="None"):
 # Kernel bandwidth
        self.bw_method = bw_method
 # Kernel weight
        self.weights = weights
        
    def fit(self, X, y):
 Number of labels for # y
        self.y_num = len(np.unique(y))   
 #Label ratio calculation
        self.label, y_count = np.unique(y, return_counts=True)
        self.y_rate = y_count/y_count.sum()
 # List containing estimated probability density functions
        self.kernel_ = []
 # Store probability density function
        for i in range(self.y_num):
            kernel = gaussian_kde(X[y==self.label[i]].T)
            self.kernel_.append(kernel)
        return self

    def predict(self, X):
 # List to store predictive labels
        pred = []
 #Ndarray that stores the probabilities of test data by label
        self.p_ = np.empty([self.y_num, len(X)])
 # Store probabilities by label
        for i in range(self.y_num):
            self.p_[i] = self.kernel_[i].evaluate(X.T)
 # Multiply the label ratio
        for j in range(self.y_num):
            self.p_[j] = self.p_[j] * self.y_rate[j]
 # Prediction label allocation
        for k in range(len(X)):
            pred.append(self.label[np.argmax(self.p_.T[k])])
        return pred

The added and modified parts are explained below.

Label ratio calculation

self.label, y_count = np.unique(y, return_counts=True)
self.y_rate = y_count/y_count.sum()

Added the part to calculate the label ratio of teacher data to the fit method. The total value of y_rate is set to 1 by dividing the number of appearances y_count for each label by the total value. If you use y_count as it is without dividing by the total value, the result will not change.

In addition, the *** label breakdown (0, 1, or character string) is output as a list label. ***(important)

Calculation of probability density function

Here is the new code. Fixed to match various labels.

for i in range(self.y_num):
    kernel = gaussian_kde(X[y==self.label[i]].T)
    self.kernel_.append(kernel)

This is the code so far ↓ ↓

kernel = gaussian_kde(X[y==i].T)

Originally, "data with label i" was specified, but it has been changed to specify "data with label i". By specifying from the output label, it corresponds to the label (character string etc.) that is not a non-negative integer.

Reflect label ratio

for j in range(self.y_num):
    self.p_[j] = self.p_[j] * self.y_rate[j]

Added a part to the predict method that multiplies the probability density by the label ratio.

Forecast label allocation

Here is the new code.

for k in range(len(X)):
    pred.append(self.label[np.argmax(self.p_.T[k])])

By specifying the assigned label from the list label, labels other than non-negative integers are also supported.

Other

I rewrote the messy part of the predict method using numpy. By creating the ndarray first and then substituting the result, readability and calculation speed are improved.

Finally

I managed to finish writing in my brain to the end. Thank you for your relationship. We hope that those who read the article will find it even a little more interesting.

(Corrected on August 5, 2020)

Recommended Posts

[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
[Machine learning] Supervised learning using kernel density estimation
Machine Learning: Supervised --AdaBoost
Machine Learning: Supervised --Linear Regression
Python: Supervised Learning: Hyperparameters Part 1
Kernel density estimation in Python
Machine Learning: Supervised --Random Forest
Python: Supervised Learning: Hyperparameters Part 2
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
Machine Learning: Supervised --Decision Tree
Python beginners publish web applications using machine learning [Part 1] Introduction
Machine Learning: Supervised --Linear Discriminant Analysis
Application development using Azure Machine Learning
Stock price forecast using machine learning (scikit-learn)
Predict power demand with machine learning Part 2
[Machine learning] LDA topic classification using scikit-learn
[Machine learning] FX prediction using decision trees
Machine learning
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Stock price forecast using machine learning (regression)
[Machine learning] Regression analysis using scikit learn
EV3 x Pyrhon Machine Learning Part 3 Classification
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
Machine learning memo of a fledgling engineer Part 1
Classification of guitar images by machine learning Part 1
A story about simple machine learning using TensorFlow
Machine learning starting with Python Personal memorandum Part2
Data supply tricks using deques in machine learning
Basics of Supervised Learning Part 1-Simple Regression- (Note)
Machine learning starting with Python Personal memorandum Part1
EV3 x Pyrhon Machine Learning Part 1 Environment Construction
EV3 x Python Machine Learning Part 2 Linear Regression
Face image dataset sorting using machine learning model (# 3)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
[Python3] Let's analyze data using machine learning! (Regression)
Machine learning memo of a fledgling engineer Part 2
Reasonable price estimation of Mercari by machine learning
Classification of guitar images by machine learning Part 2
Try using Jupyter Notebook of Azure Machine Learning
Basics of Supervised Learning Part 3-Multiple Regression (Implementation)-(Notes)-
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
Report_Deep Learning (Part 2)
Report_Deep Learning (Part 1)
Report_Deep Learning (Part 2)
Supervised learning (classification)
[Memo] Machine learning
Machine learning classification
Machine Learning sample
What I learned about AI / machine learning using Python (1)
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Create machine learning projects at explosive speed using templates
What I learned about AI / machine learning using Python (3)
Machine Learning Amateur Marketers Challenge Kaggle's House Prices (Part 1)
Python: Diagram of 2D data distribution (kernel density estimation)
Machine Learning with Caffe -1-Category images using reference model
Tech-Circle Let's start application development using machine learning (self-study)
[Machine learning] Try to detect objects using Selective Search
[Machine learning] Text classification using Transformer model (Attention-based classifier)