University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method

Last time University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method https://github.com/legacyworld/sklearn-basic

Exercise 4.3 The steepest descent method and the stochastic steepest descent method

Explanation is the 5th (1) per 24 minutes 30 seconds Last time, only the re-descent method was possible, so this time I will implement the probabilistic re-descent method. The program itself doesn't change that much. Mathematically it looks like this:

\lambda =Regularization parameters,
\beta = \begin{pmatrix} \beta_0 \\ \beta_1\\ \vdots \\ \beta_m \end{pmatrix},
y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{pmatrix},
X = \begin{pmatrix}
1&x_{11}&x_{12}&\cdots&x_{1m}\\
1&x_{21}&x_{22}&\cdots&x_{2m}\\
\vdots\\
1&x_{N1}&x_{N2}&\cdots&x_{Nm}
\end{pmatrix}\\ \\
\beta^{t+1} = \beta^{t}(1-2\lambda\eta) - \eta\frac{1}{N}x_i^T(x_i\beta^t-y_i)

Until now, all the data was used for the calculation of the gradient, but the calculation is performed only with $ x_i, y_i $, which are randomly selected. What caught me a little was the Numpy Transpose specification. In the case of a one-dimensional vector, if you use transpose (.T), it will be returned as it is, so you need to use .reshape (-1,1). See here. https://note.nkmk.me/python-numpy-transpose/

a_1d = np.arange(3)
print(a_1d)
# [0 1 2]

print(a_1d.T)
# [0 1 2]

a_col = a_1d.reshape(-1, 1)
print(a_col)
# [[0]
#  [1]
#  [2]]

Click here for the source code.

python:Homework_4.3SGD.py


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.base import BaseEstimator
from sklearn.model_selection import cross_validate
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self,ep,eta,l):
        self.ep = ep
        self.eta = eta
        self.l = l
        self.loss = []
    # fit()Implemented
    def fit(self, X, y):
        self.coef_ = self.stochastic_grad_desc(X,y)
        #fit returns self
        return self

    # predict()Implemented
    def predict(self, X):
        return np.dot(X, self.coef_)

    def shuffle(self,X,y):
        r = np.random.permutation(len(y))
        return X[r],y[r]

    def stochastic_grad_desc(self,X,y):
        m = len(y)
        loss = []
        #Types of features
        dim = X.shape[1]
        #Initial value of beta
        beta = np.ones(dim).reshape(-1,1)
        eta = self.eta
        l = self.l
        X_shuffle, y_shuffle = self.shuffle(X,y)
        #If it does not improve T times, it ends
        T = 100
        #Number of times not improved
        not_improve = 0
        #Objective function minimum value initial value
        min = 10 ** 9
        while True:
            for Xi,yi in zip(X_shuffle,y_shuffle):
                loss.append((1/(2*m))*np.sum(np.square(np.dot(X,beta)-y)))
                beta = beta*(1-2*l*eta) - eta*Xi.reshape(-1,1)*(np.dot(Xi,beta)-yi)
                if loss[len(loss)-1] < min:
                    min = loss[len(loss)-1]
                    min_beta = beta
                    not_improve = 0
                else:
                    #If the minimum value of the objective function is not updated
                    not_improve += 1
                    if not_improve >= T:
                        break
            #If all samples are finished but the minimum value changes within T times, loop again
            if not_improve >= T:
                self.loss = loss
                break
        return min_beta

#scikit-Import wine data from lean
df= pd.read_csv('winequality-red.csv',sep=';')
#Since the target value quality is included, create a dropped dataframe
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
X = df1.values
scaler = preprocessing.StandardScaler()
X_fit = scaler.fit_transform(X)
X_fit = sm.add_constant(X_fit) #Add 1 to the first column
epsilon = 10 ** (-7)
eta_list = [0.03,0.01,0.003]
loss = []
coef = []
for eta in eta_list:
    l = 10**(-5)
    test_min = 10**(9)
    while l <= 1/(2*eta):
        myest = MyEstimator(epsilon,eta,l)
        myest.fit(X_fit,y)
        scores = cross_validate(myest,X_fit,y,scoring="neg_mean_squared_error",cv=10)
        if abs(scores['test_score'].mean()) < test_min:
            test_min = abs(scores['test_score'].mean())
            loss = myest.loss
            l_min = l
            coef = myest.coef_
        l = l * 10**(0.5)
    plt.plot(loss,label=f"$\eta$={eta}")
    print(f"eta = {eta} : iter = {len(loss)}, loss = {loss[-1]}, lambda = {l_min}, TestErr = {test_min}")
    #Coefficient output: Since the intercept is included at the very beginning, take it out from the second and output the intercept at the end.
    i = 1
    for column in df1.columns:
        print(column,coef[i][0])
        i+=1
    print('intercept',coef[0][0])
plt.legend()
plt.savefig("sgd.png ")

In the commentary, the condition "Stop if there is no improvement 100 times in a row" was written, so it is adjusted accordingly. In the pattern where $ \ eta $ is small, even if all 1599 are used, this condition is not met, so it may enter the second lap. The result looks like this. sgd.png At $ \ eta = 0.03 $, you can see that the error increases a little toward the end. The coefficient obtained at the end, etc.

eta = 0.03 : iter = 298, loss = 0.29072324272824085, lambda = 0.0031622776601683803, TestErr = 0.47051639691326796
fixed acidity 0.1904239451124434
volatile acidity -0.11242984344193296
citric acid -0.00703125780915424
residual sugar 0.2092352618792849
chlorides -0.044795495356479025
free sulfur dioxide -0.018863685196341816
total sulfur dioxide 0.07447982325062003
density -0.17305138620126106
pH 0.05808006453308803
sulphates 0.13876262568557934
alcohol 0.2947134691111974
intercept 5.6501294014064145
eta = 0.01 : iter = 728, loss = 0.24203354045966255, lambda = 0.00010000000000000002, TestErr = 0.45525344581852156
fixed acidity 0.25152952212309976
volatile acidity -0.03876889927769888
citric acid 0.14059421863669852
residual sugar 0.06793602828251821
chlorides -0.0607861479963043
free sulfur dioxide 0.08441853171277111
total sulfur dioxide -0.09642176480191654
density -0.2345690991118163
pH 0.1396740265674562
sulphates 0.1449843342292861
alcohol 0.19737851967044345
intercept 5.657998427200384
eta = 0.003 : iter = 1758, loss = 0.22475918775097103, lambda = 0.00010000000000000002, TestErr = 0.44693442950748147
fixed acidity 0.2953542653448508
volatile acidity -0.12934364893075953
citric acid 0.04629080083382285
residual sugar 0.013753852832452122
chlorides -0.03688613363045954
free sulfur dioxide 0.045541235818534614
total sulfur dioxide -0.049594638329345575
density -0.17427360277645224
pH 0.13897225246491407
sulphates 0.15425590075925466
alcohol 0.26518804857692096
intercept 5.597149258230254

It's shuffled randomly, so the results change with each run.

Past posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) https://github.com/legacyworld/sklearn-basic

Recommended Posts

University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method
University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (17)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (5)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (16)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (2)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (13)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (9)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (4)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (12)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (1)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (11)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (3)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (14)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (6)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (15)