2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]

** How do these differences affect the forecast results? ** ** The case of the classification model of Last time is shown as an example.

2_8_2_01.PNG

** Furthermore, I would like to compare the case of the regression model. ** **

⑴ Import library

import numpy as np
import pandas as pd

# scikit-learn library
from sklearn.datasets import load_boston             #Boston Home Price Dataset
from sklearn.model_selection import train_test_split #Data split utility
from sklearn.neighbors import KNeighborsRegressor    # k-NR regression model method

#Visualization library
import matplotlib.pyplot as plt
import seaborn as sns

#Japanese display module of matplotlib
!pip install japanize-matplotlib
import japanize_matplotlib

1. Prepare the data

⑵ Data acquisition and organization

#Get dataset
boston = load_boston()

#Convert explanatory variables to DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)

#Concatenate objective variables
df = pd.concat([df, pd.DataFrame(boston.target, columns=['MEDV'])], axis=1)
print(df)

2_8_2_02.PNG

(3) Examination of analysis axis by correlation matrix

#Create a correlation matrix
correlation_matrix = np.corrcoef(df.T)

#Row / column labels
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 
         'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']

#Convert correlation matrix to DataFrame
correlation_df = pd.DataFrame(correlation_matrix, columns = names, index = names)

#Draw heatmap
plt.figure(figsize=(10,8))
sns.heatmap(correlation_df, annot=True, cmap='coolwarm')

2_8_2_03.PNG

⑷ Data extraction and division

#Extract only 2 variables
df_extraction = df[['RM', 'MEDV']]

#Variable X,set y
X = np.array(df_extraction['RM'])
y = np.array(df_extraction['MEDV'])

X = X.reshape(len(X), 1) #Convert to 2D
y = y.reshape(len(y), 1)

#Data division for training / testing
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

X_train = X_train.reshape(len(X_train), 1) #Convert to 2D
X_test = X_test.reshape(len(X_test), 1)
y_train = y_train.reshape(len(y_train), 1)
y_test = y_test.reshape(len(y_test), 1)

2. Examination of k parameters

⑸ Execute k-NR while changing the k parameter

#Variable to store the correct answer rate
train_accuracy = []
test_accuracy = []

for k in range(1,21):
    kNR = KNeighborsRegressor(n_neighbors = k) #Instance generation
    kNR.fit(X_train, y_train) #Learning
    train_accuracy.append(kNR.score(X_train, y_train)) #Training accuracy rate
    test_accuracy.append(kNR.score(X_test, y_test)) #Test accuracy rate

#Convert correct answer rate to array
training_accuracy = np.array(train_accuracy)
test_accuracy = np.array(test_accuracy)

⑹ Select the optimum k parameter

#Changes in the accuracy rate of training and tests
plt.figure(figsize=(6, 4))

plt.plot(range(1,21), train_accuracy, label='Training')
plt.plot(range(1,21), test_accuracy, label='test')

plt.xticks(np.arange(0, 21, 1)) #x-axis scale
plt.xlabel('k number')
plt.ylabel('Correct answer rate')
plt.title('Transition of correct answer rate')

plt.grid()
plt.legend()

#Transition of difference in correct answer rate
plt.figure(figsize=(6, 4))

difference = np.abs(train_accuracy - test_accuracy) #Calculate the difference
plt.plot(range(1,21), difference, label='Difference')

plt.xticks(np.arange(0, 21, 1)) #x-axis scale
plt.xlabel('k number')
plt.ylabel('Difference(train - test)')
plt.title('Transition of difference in correct answer rate')

plt.grid()
plt.legend()

plt.show()

2_8_2_05.PNG

3. Model execution and evaluation

⑺ Create dummy data

#Generate arithmetic progression
t = np.linspace(1, 10, 1000) #Starting value,End value,Element count

#Convert shape to 2D
T = t.reshape(1000, 1)

⑻ Execution and visualization of regression model

n_neighbors = 14

plt.figure(figsize=(12,5))

for i, w in enumerate(['uniform', 'distance']):
    model = KNeighborsRegressor(n_neighbors, weights=w)
    model = model.fit(X, y)
    y_ = model.predict(T)

    plt.subplot(1, 2, i + 1)
    plt.scatter(X, y, color='limegreen', label='data')
    plt.plot(T, y_, color='navy', lw=1, label='Predicted value')
    plt.legend()
    plt.title("weights = '%s'" % (w))

plt.tight_layout()
plt.show()

2_8_2_06.PNG

Recommended Posts

2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]
2. Multivariate analysis spelled out in Python 8-1. K-nearest neighbor method (scikit-learn)
2. Multivariate analysis spelled out in Python 8-3. K-nearest neighbor method [cross-validation]
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 7-3. Decision tree [regression tree]
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 1-2. Simple regression analysis (algorithm)
2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]
2. Multivariate analysis spelled out in Python 7-2. Decision tree [difference in division criteria]
Regression analysis in Python
Implemented k-nearest neighbor method in python from scikit learn
First simple regression analysis in Python
[Python] [scikit-learn] k-nearest neighbor method introductory memo
Regression analysis method
Simplex method (simplex method) in Python
Private method in python
Association analysis in Python
[Machine learning] Write the k-nearest neighbor method (k-nearest neighbor method) in python by yourself and recognize handwritten numbers.
A simple Python implementation of the k-nearest neighbor method (k-NN)
Multiple regression expressions in Python
Visualize Keras model in Python 3.5
K-nearest neighbor method (multiclass classification)
Online linear regression in Python
Suppressing method overrides in Python
[SIR model analysis] Peak out of infections in various parts of Japan ♬
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)