Model generalization

What is generalization?

The approach taken to prevent overfitting is generalization. By creating a model that is conscious of generalization, it does not fit too much with the data used for training. You will be able to handle general cases.

The purpose of regression analysis is to learn a model from historical data and predict unknown data. The basis is to set the model to predict values using historical data.

However, historical data does not completely explain events such as stock price fluctuations and sales fluctuations. There is a range of data predictions, and even if the data you enter is the same, the actual results may change.

Overfitting of past data, such as overfitting, causes data prediction failures It may cause the prediction accuracy to decrease.

Regularization

In linear regression, regularization is used as a generalization method. Regularization is the complexity of the relationships between the data estimated by the model for the model that performs the regression analysis. It is an approach that attempts to generalize the relationships between the data estimated by the model by adding a penalty.

L1 regularization and L2 regularization are often used as regularization.

L1 regularization

This is a method to reduce the coefficient. By bringing the coefficient for "data that does not easily affect forecasts" close to zero This is a technique that allows you to obtain a sparse model. It is useful when performing regression analysis of data that has a lot of extra information as data. It can also be used as a method for reducing features.

L2 regularization

So to speak, it is the upper limit setting of the coefficient. It is a method of limiting the magnitude of the coefficient so that it does not become too large, and is used to suppress overfitting. It is said that it is easy to obtain a smooth model (easy to generalize) because the coefficient obtained as a result of learning does not increase unnaturally. There is a feature.

L1 regularization and L2 regularization can be imagined as shown in the figure. The part shown in green is the condition for regularization, and the blue contour line is the loss function when regularization is not performed.

Without regularization, the coefficients w1w1 and w2w2 converge to the blue circles. However, if regularization is performed, it is necessary to approach the green part as a condition, so the coefficient w1w1 and w2w2 will now converge to the right point (orange point).

As explained above, the figure converges to the point w1 = 0 w1 = 0 using L1 regularization. Also, using L2 regularization, both w1w1 and w2w2 are smaller than the blue dots.

Lasso return

Lasso regression is a regression model that sets appropriate parameters for linear regression while performing L1 regularization.

In machine learning, it may be difficult for humans to recognize the relationships between the data used for prediction. In L1 regularization, when performing regression analysis of data that has a lot of extra information as data I confirmed that it would be useful. Therefore, if the number of parameters (number of columns) is larger than the number of datasets (number of rows), It is better to use the lasso regression.

# scikit-learn linear_Lasso in the model module()Is the model of the lasso regression.

from sklearn.linear_model import Lasso
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=100, n_features=100, n_informative=60, n_targets=1, random_state=42)
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

model = Lasso()
model.fit(train_X, train_y)
print(model.score(test_X, test_y))
#Output result
0.967921092594

Linear regression
model =LinearRegression()To
model = Lasso()Just change to and you can analyze with lasso regression.

Ridge regression

Ridge regression is a regression model that sets appropriate parameters for linear regression while performing L2 regularization.

Ridge regression has the characteristic that it is easier to obtain (generalize) a smooth model than a simple linear regression model.

In the linear_model module of scikit-learn

Ridge()Is the model of ridge regression.

The implementation method is exactly the same as the simple linear regression model and lasso regression, just replace the model name.

from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=100, n_features=50, n_informative=50, n_targets=1, noise=100.0, random_state=42)
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

model = Ridge()
model.fit(train_X, train_y)
print(model.score(test_X, test_y))
#Output result
0.90786283239

Elastic Net regression

ElasticNet regression is a model that combines Lasso regression and Ridge regression to create a regularization term.

As a merit The point that information is selected for data that has a lot of extra information handled by Lasso regression Because it is a combination of points that makes it easy to obtain (generalize) the smooth model handled by ridge regression. This is the best method when you want to create a model that balances the merits of both.

from sklearn.linear_model import ElasticNet

model = ElasticNet()
#In addition, scikit-learn Elastic Net()To l1_You can specify an argument called ratio.

model = ElasticNet(l1_ratio=0.3)

#With the above settings, you can specify the ratio of L1 regularization and L2 regularization.
#In the above cases, it means that L1 regularization is 30% effective and L2 regularization is 70% effective.
#(If not specified, it will be specified in the Elastic Net regression model of exactly half and half.)

Regression summary


from sklearn.linear_model import Ridge
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

#Generate data
X, y = make_regression(n_samples=100, n_features=50, n_informative=50, n_targets=1, noise=100.0, random_state=42)
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

#① Linear regression
model = LinearRegression()
model.fit(train_X,train_y)


# test_X, test_Output the coefficient of determination for y
print("Linear regression:{}".format(model.score(test_X, test_y)))


#② Lasso return(L1 regularization)
model = Lasso().fit(train_X,train_y)

# test_X, test_Output the coefficient of determination for y
print("Lasso return:{}".format(model.score(test_X, test_y)))

#③ Ridge regression(L2 regularization)
model = Ridge()
model.fit(train_X,train_y)

# test_X, test_Output the coefficient of determination for y
print("Ridge regression:{}".format(model.score(test_X, test_y)))

#④ Elastic Net regression(Regularization of both L1 and L2)

model = ElasticNet()
#In addition, scikit-learn Elastic Net()To l1_You can specify an argument called ratio.

model = ElasticNet(l1_ratio=0.3)
#It shows that L1 regularization is 30% effective and L2 regularization is 70% effective.

Python: Application of supervised learning (regression)