Linear Models Let y hat be the predicted value, vector w (= w1, w2 ..., wp) be the coefficient (coef), and w0 be the intercept (intercept).
\hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p
Ordinary Least Squares Find the following coefficients that minimize the residual sum of squares. The L2 norm means an ordinary Euclidean distance.
\min_{w} || X w - y||_2^2
sklearn
class sklearn.linear_model.LinearRegression(*, fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)
Implementation
Ridge Regression As a loss function, the regularization term of the square of the L2 norm is added. The absolute value of the coefficient is suppressed, which prevents overfitting.
\min_{w} || X w - y||_2^2 + \alpha ||w||_2^2
sklearn
class sklearn.linear_model.Ridge(alpha=1.0, *, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', random_state=None)
Lasso Regression As a loss function, the regularization term of L1 norm (Manhattan distance) is added. It may be possible to reduce the dimension of the feature quantity by setting a part of the coefficient to 0.
\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}
Multi-task Lasso
Elastic-Net Add a regularization term for the sum of the L1 norm and the L2 norm. It results in Ridge regression when ρ = 0 and Lasso regression when ρ = 1.
\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha \rho ||w||_1 +
\frac{\alpha(1-\rho)}{2} ||w||_2 ^ 2}
Multi-task Elastic-Net
Least Angle Regression (LARS)
Orthogonal Matching Pursuit (OMP) There is a stop condition.
\underset{w}{\operatorname{arg\,min\,}} ||y - Xw||_2^2 \text{ subject to } ||w||_0 \leq n_{\text{nonzero\_coefs}}
Bayesian Regression
p(y|X,w,\alpha) = \mathcal{N}(y|X w,\alpha)
Logistic Regression Classification while saying Regression. A statistical regression model of variables that follow the Bernoulli distribution. Use logit as a concatenation function.
Predict user hobbies such as movies, music, search results, and shopping. There are Collaborative Filtering that makes predictions based on similar user preferences, and Content-based Filtering that makes predictions based on what users have liked in the past.
A learning algorithm for the state action value Q (s, a) when s is the state, a is the action, and r is the reward. In the following equation, α means the learning rate and γ means the discount rate. Q (st, at) is updated one after another according to α as follows. The maximum Q value of the update destination state st + 1 is adopted according to γ.
Q(s_t, a_t) \leftarrow (1-\alpha)Q(s_t, a_t) + \alpha(r_{t+1} + \gamma \max_{a_{t+1}}Q(s_{t+1}, a_{t+1}))\\
Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha(r_{t+1} + \gamma \max_{a_{t+1}}Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t))
Sarsa
Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha(r_{t+1} + \gamma Q(s_{t+1}, a_{t+1}) - Q(s_t, a_t))
Returns(s, a) \leftarrow append(Returns(s, a), r)\\
Q(s, a) \leftarrow average(Returns(s, a))