Last time University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) https://github.com/legacyworld/sklearn-basic

Exercise 4.3 The steepest descent method and the stochastic steepest descent method

Commentary is 5th (1) per 21 minutes When I read the assignment, I thought it would be an easy win, but when I looked at the explanation, scikit-learn was not possible as it was (probably). The reason is that I couldn't find a way to draw a graph that displays Loss every time I move in steps. Since it's a big deal, I implemented the following to study various things.

--Self-made re-descent method --Cross-validate using your own re-descent method

If you google, there are many re-descent methods, but I can't find many that include ridge regression. This is the mathematical explanation. https://www.kaggle.com/residentmario/ridge-regression-cost-function

\lambda =Regularization parameters,
\beta = \begin{pmatrix} \beta_0 \\ \beta_1\\ \vdots \\ \beta_m \end{pmatrix},
y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{pmatrix},
X = \begin{pmatrix}
1&x_{11}&x_{12}&\cdots&x_{1m}\\
1&x_{21}&x_{22}&\cdots&x_{2m}\\
\vdots\\
1&x_{N1}&x_{N2}&\cdots&x_{Nm}
\end{pmatrix}\\ \\
\beta^{t+1} = \beta^{t}(1-2\lambda\eta) - \eta\frac{1}{N}X^T(X\beta^t-y) (※2020/5/31 Correction)

With this wine data, $ m = 11,N = 1599 $ When applied to the program

\lambda : l \beta : beta \eta : eta X: X_fit (X in MyEstimator) y : y

For the program, refer to the following. Machine learning from the beginning: Multiple regression model by the steepest descent method-From scratch with Python and R-

`python:Homework_4.3GD.py`


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.base import BaseEstimator
from sklearn.model_selection import cross_validate
import statsmodels.api as sm

class MyEstimator(BaseEstimator):
    def __init__(self,ep,eta,l):
        self.ep = ep
        self.eta = eta
        self.l = l
        self.loss = []
    # fit()Implemented
    def fit(self, X, y):
        self.coef_ = self.grad_desc(X,y)
        #fit returns self
        return self

    # predict()Implemented
    def predict(self, X):
        return np.dot(X, self.coef_)

    def grad_desc(self,X,y):
        m = len(y)
        loss = []
        diff = 10**(10)
        ep = self.ep
        #Types of features
        dim = X.shape[1]
        #Initial value of beta
        beta = np.ones(dim).reshape(-1,1)
        eta = self.eta
        l = self.l
        while abs(diff) > ep:
            loss.append((1/(2*m))*np.sum(np.square(np.dot(X,beta)-y)))
            beta = beta*(1-2*l*eta) - eta*(1/m)*np.dot(X.T,(np.dot(X,beta)-y))
            if len(loss) > 1:
                diff = loss[len(loss)-1] - loss[len(loss)-2]
        self.loss = loss
        return beta

#scikit-Import wine data from lean
df= pd.read_csv('winequality-red.csv',sep=';')
#Since the target value quality is included, create a dropped dataframe
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
X = df1.values
scaler = preprocessing.StandardScaler()
X_fit = scaler.fit_transform(X)
X_fit = sm.add_constant(X_fit) #Add 1 to the first column
epsilon = 10 ** (-7)
eta_list = [0.3,0.1,0.03]
loss = []
coef = []
for eta in eta_list:
    l = 10**(-5)
    test_min = 10**(9)
    while l <= 1/(2*eta):
        myest = MyEstimator(epsilon,eta,l)
        myest.fit(X_fit,y)
        scores = cross_validate(myest,X_fit,y,scoring="neg_mean_squared_error",cv=10)
        if abs(scores['test_score'].mean()) < test_min:
            test_min = abs(scores['test_score'].mean())
            loss = myest.loss
            l_min = l
            coef = myest.coef_
        l = l * 10**(0.5)
    plt.plot(loss)
    print(f"eta = {eta} : iter = {len(loss)}, loss = {loss[-1]}, lambda = {l_min}")
    #Coefficient output: Since the intercept is included at the very beginning, take it out from the second and output the intercept at the end.
    i = 1
    for column in df1.columns:
        print(column,coef[i][0])
        i+=1
    print('intercept',coef[0][0])
plt.savefig("gd.png ")

I also included cross-validation, so I made my own Estimator. The Estimator is basically something like linear_model.Ridge, and if you put it in the argument of cross_validation, you can get K-fold etc. done. You only need to implement at least two methods (fit, predict) in your class (any name). It's very easy. The calculation is also simple, calculate the cost while changing $ \ beta $, and if the cost becomes smaller than $ \ epsilon $ (epsilon), it ends. In the part of loss.append in the middle of this calculation, how it changes when the step is changed is stored in the list.

This time, I couldn't see the source code at all in the explanation video, so I didn't know what the regularization parameter was set to. I think it is the default value (= 0.0001) in SGDregressor of sklearn, but I also searched for the place where the test error of cross-validation is minimized. So the loop is doubled. While changing $ \ eta $ by 0.3,0.1,0.03, we are changing $ \ lambda $ from $ 10 ^ {-5} $ to $ \ frac {1} {2 \ eta} $. I've made sure that $ 1-2 \ lambda \ eta $ isn't negative. $ \ Epsilon $, which determines the end of calculation, was $ 10 ^ {-7} $ when the number of calculations was the same as the explanation.

Click here for calculation results

The shapes are almost the same, so it's okay. The coefficients and intercepts actually obtained are also almost correct. University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) It's not so different from what you asked for at the link above.

eta = 0.3 : iter = 211, loss = 0.20839282208794876, lambda = 0.000316227766016838
fixed acidity 0.049506870332573755
volatile acidity -0.19376515874038097
citric acid -0.03590899184362026
residual sugar 0.02477115609195419
chlorides -0.08766609020245213
free sulfur dioxide 0.04504300145052993
total sulfur dioxide -0.1066524471717945
density -0.039236958974544434
pH -0.060484490718680374
sulphates 0.1558562351611723
alcohol 0.29101267115037016
intercept 5.632460233437699
eta = 0.1 : iter = 539, loss = 0.20839849335391505, lambda = 0.000316227766016838
fixed acidity 0.05411085995631372
volatile acidity -0.19374570028895227
citric acid -0.03641567617051897
residual sugar 0.026096674744724647
chlorides -0.08728538562384357
free sulfur dioxide 0.044674324756584935
total sulfur dioxide -0.10616011146688299
density -0.04332301301614413
pH -0.05803157075853309
sulphates 0.15635770126837817
alcohol 0.28874633335328637
intercept 5.632460233437699
eta = 0.03 : iter = 1457, loss = 0.2084181454096448, lambda = 0.000316227766016838
fixed acidity 0.06298223685986547
volatile acidity -0.19369711526783526
citric acid -0.03737402225868385
residual sugar 0.028655773905239445
chlorides -0.08655776298773829
free sulfur dioxide 0.04397075187169952
total sulfur dioxide -0.10522175105302445
density -0.051210328173935296
pH -0.05330134909461606
sulphates 0.15732818468260018
alcohol 0.28436648926510527
intercept 5.632460233437668

Past posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) https://github.com/legacyworld/sklearn-basic

University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method

Exercise 4.3 The steepest descent method and the stochastic steepest descent method

python:Homework_4.3GD.py

Past posts

`python:Homework_4.3GD.py`