It's a common story for me to try Optuna, a hyperparameter search method that is said to be better than grid search. The code below is all working on Google Colaboratory.
I didn't have Optuna on Google Colaboratory so I pip installed it. Easy.
!pip install optuna
Collecting optuna
[?25l Downloading https://files.pythonhosted.org/packages/d4/6a/4d80b3014797cf318a5252afb27031e9e7502854fb7930f27db0ee10bb75/optuna-0.19.0.tar.gz (126kB)
[K |████████████████████████████████| 133kB 4.9MB/s
[?25hCollecting alembic
[?25l Downloading https://files.pythonhosted.org/packages/84/64/493c45119dce700a4b9eeecc436ef9e8835ab67bae6414f040cdc7b58f4b/alembic-1.3.1.tar.gz (1.1MB)
[K |████████████████████████████████| 1.1MB 42.4MB/s
[?25hCollecting cliff
[?25l Downloading https://files.pythonhosted.org/packages/f6/a9/e976ba91e57043c4b6add2c394e6d1ffc26712c694379c3fe72f942d2440/cliff-2.16.0-py2.py3-none-any.whl (78kB)
[K |████████████████████████████████| 81kB 8.5MB/s
[?25hCollecting colorlog
Downloading https://files.pythonhosted.org/packages/68/4d/892728b0c14547224f0ac40884e722a3d00cb54e7a146aea0b3186806c9e/colorlog-4.0.2-py2.py3-none-any.whl
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from optuna) (1.17.4)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from optuna) (1.3.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from optuna) (1.12.0)
Requirement already satisfied: sqlalchemy>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from optuna) (1.3.11)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from optuna) (4.28.1)
Requirement already satisfied: typing in /usr/local/lib/python3.6/dist-packages (from optuna) (3.6.6)
Collecting Mako
[?25l Downloading https://files.pythonhosted.org/packages/b0/3c/8dcd6883d009f7cae0f3157fb53e9afb05a0d3d33b3db1268ec2e6f4a56b/Mako-1.1.0.tar.gz (463kB)
[K |████████████████████████████████| 471kB 37.3MB/s
[?25hCollecting python-editor>=0.3
Downloading https://files.pythonhosted.org/packages/c6/d3/201fc3abe391bbae6606e6f1d598c15d367033332bd54352b12f35513717/python_editor-1.0.4-py3-none-any.whl
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages (from alembic->optuna) (2.6.1)
Requirement already satisfied: PyYAML>=3.12 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (3.13)
Collecting cmd2!=0.8.3,<0.9.0,>=0.8.0
[?25l Downloading https://files.pythonhosted.org/packages/e9/40/a71caa2aaff10c73612a7106e2d35f693e85b8cf6e37ab0774274bca3cf9/cmd2-0.8.9-py2.py3-none-any.whl (53kB)
[K |████████████████████████████████| 61kB 7.7MB/s
[?25hCollecting pbr!=2.1.0,>=2.0.0
[?25l Downloading https://files.pythonhosted.org/packages/7a/db/a968fd7beb9fe06901c1841cb25c9ccb666ca1b9a19b114d1bbedf1126fc/pbr-5.4.4-py2.py3-none-any.whl (110kB)
[K |████████████████████████████████| 112kB 36.3MB/s
[?25hRequirement already satisfied: pyparsing>=2.1.0 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (2.4.5)
Collecting stevedore>=1.20.0
[?25l Downloading https://files.pythonhosted.org/packages/b1/e1/f5ddbd83f60b03f522f173c03e406c1bff8343f0232a292ac96aa633b47a/stevedore-1.31.0-py2.py3-none-any.whl (43kB)
[K |████████████████████████████████| 51kB 6.5MB/s
[?25hRequirement already satisfied: PrettyTable<0.8,>=0.7.2 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (0.7.2)
Requirement already satisfied: MarkupSafe>=0.9.2 in /usr/local/lib/python3.6/dist-packages (from Mako->alembic->optuna) (1.1.1)
Requirement already satisfied: wcwidth; sys_platform != "win32" in /usr/local/lib/python3.6/dist-packages (from cmd2!=0.8.3,<0.9.0,>=0.8.0->cliff->optuna) (0.1.7)
Collecting pyperclip
Downloading https://files.pythonhosted.org/packages/2d/0f/4eda562dffd085945d57c2d9a5da745cfb5228c02bc90f2c74bbac746243/pyperclip-1.7.0.tar.gz
Building wheels for collected packages: optuna, alembic, Mako, pyperclip
Building wheel for optuna (setup.py) ... [?25l[?25hdone
Created wheel for optuna: filename=optuna-0.19.0-cp36-none-any.whl size=170198 sha256=fdc7777d7454f3419bc9acfd4f83f5cf6f23f0d6a6f392fc744afb597484f156
Stored in directory: /root/.cache/pip/wheels/49/bf/47/090a43457caeff74397397da1c98a8aaed685257c16a5ba1f0
Building wheel for alembic (setup.py) ... [?25l[?25hdone
Created wheel for alembic: filename=alembic-1.3.1-py2.py3-none-any.whl size=144523 sha256=c66d5c3c4bd291757c2136352ac8d3cab450cccd0cb1005fe01211c5fa7576f4
Stored in directory: /root/.cache/pip/wheels/b2/d4/19/5ab879d30af7cbc79e6dcc1d421795b1aa9d78f455b0412ef7
Building wheel for Mako (setup.py) ... [?25l[?25hdone
Created wheel for Mako: filename=Mako-1.1.0-cp36-none-any.whl size=75363 sha256=66ee5267f833ecdf52af2f6c7e8c93bb317a780d609f0765a8101383516ab29b
Stored in directory: /root/.cache/pip/wheels/98/32/7b/a291926643fc1d1e02593e0d9e247c5a866a366b8343b7aa27
Building wheel for pyperclip (setup.py) ... [?25l[?25hdone
Created wheel for pyperclip: filename=pyperclip-1.7.0-cp36-none-any.whl size=8359 sha256=7e62cb6b9e2dcb8a323251caa96de1064b10e0a80baaf6b33b2052fafa34c08e
Stored in directory: /root/.cache/pip/wheels/92/f0/ac/2ba2972034e98971c3654ece337ac61e546bdeb34ca960dc8c
Successfully built optuna alembic Mako pyperclip
Installing collected packages: Mako, python-editor, alembic, pyperclip, cmd2, pbr, stevedore, cliff, colorlog, optuna
Successfully installed Mako-1.1.0 alembic-1.3.1 cliff-2.16.0 cmd2-0.8.9 colorlog-4.0.2 optuna-0.19.0 pbr-5.4.4 pyperclip-1.7.0 python-editor-1.0.4 stevedore-1.31.0
The installation seems to be successful, so I ordered the following import statement and checked the operation.
import optuna
First of all, from the preparatory exercises to understand the operation of Optuna.
Try to minimize $ f (x) = x ^ 4 --4x ^ 3 --36x ^ 2 $.
def f(x):
return x**4 - 4 * x ** 3 - 36 * x ** 2
In Optuna, the objective function you want to minimize is defined as follows.
def objective(trial):
x = trial.suggest_uniform('x', -10, 10)
return f(x)
If you do the following, it will try only 10 times.
study = optuna.create_study()
study.optimize(objective, n_trials=10)
[32m[I 2019-12-13 00:32:08,911][0m Finished trial#0 resulted in value: 2030.566599827237. Current best value is 2030.566599827237 with parameters: {'x': 9.815207070259166}.[0m
[32m[I 2019-12-13 00:32:09,021][0m Finished trial#1 resulted in value: 1252.1813135138896. Current best value is 1252.1813135138896 with parameters: {'x': 9.366926766768199}.[0m
[32m[I 2019-12-13 00:32:09,147][0m Finished trial#2 resulted in value: -283.8813965725701. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,278][0m Finished trial#3 resulted in value: 1258.1505983061907. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,409][0m Finished trial#4 resulted in value: -59.988164166655146. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,539][0m Finished trial#5 resulted in value: 6493.216295606622. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,670][0m Finished trial#6 resulted in value: 233.47766027651414. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,797][0m Finished trial#7 resulted in value: -32.56782816991587. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,924][0m Finished trial#8 resulted in value: 9713.778056852296. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:10,046][0m Finished trial#9 resulted in value: -499.577141711988. Current best value is -499.577141711988 with parameters: {'x': 3.6629193285453887}.[0m
If you check the number of trials
len(study.trials)
10
You can check the parameters that minimize the objective function as follows.
study.best_params
{'x': 3.6629193285453887}
The minimum value of the objective function is obtained as follows.
study.best_value
-499.577141711988
The information of the trial that got the minimum value is obtained in this way.
study.best_trial
FrozenTrial(number=9, state=TrialState.COMPLETE, value=-499.577141711988, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 926514), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 10, 46429), params={'x': 3.6629193285453887}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 9}, intermediate_values={}, trial_id=9)
The history of attempts can be seen in this way.
study.trials
[FrozenTrial(number=0, state=TrialState.COMPLETE, value=2030.566599827237, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 8, 821843), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 8, 911548), params={'x': 9.815207070259166}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 0}, intermediate_values={}, trial_id=0),
FrozenTrial(number=1, state=TrialState.COMPLETE, value=1252.1813135138896, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 8, 912983), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 20790), params={'x': 9.366926766768199}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 1}, intermediate_values={}, trial_id=1),
FrozenTrial(number=2, state=TrialState.COMPLETE, value=-283.8813965725701, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 22532), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 147430), params={'x': 2.6795376432294855}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 2}, intermediate_values={}, trial_id=2),
FrozenTrial(number=3, state=TrialState.COMPLETE, value=1258.1505983061907, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 149953), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 277900), params={'x': 9.37074944280344}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 3}, intermediate_values={}, trial_id=3),
FrozenTrial(number=4, state=TrialState.COMPLETE, value=-59.988164166655146, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 281543), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 409038), params={'x': -1.4636181925092284}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 4}, intermediate_values={}, trial_id=4),
FrozenTrial(number=5, state=TrialState.COMPLETE, value=6493.216295606622, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 410813), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 539381), params={'x': -8.979003291609324}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 5}, intermediate_values={}, trial_id=5),
FrozenTrial(number=6, state=TrialState.COMPLETE, value=233.47766027651414, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 542239), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 669699), params={'x': -5.01912242330347}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 6}, intermediate_values={}, trial_id=6),
FrozenTrial(number=7, state=TrialState.COMPLETE, value=-32.56782816991587, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 671679), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 797441), params={'x': -1.027752432268013}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 7}, intermediate_values={}, trial_id=7),
FrozenTrial(number=8, state=TrialState.COMPLETE, value=9713.778056852296, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 799124), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 924648), params={'x': -9.843104909274034}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 8}, intermediate_values={}, trial_id=8),
FrozenTrial(number=9, state=TrialState.COMPLETE, value=-499.577141711988, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 926514), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 10, 46429), params={'x': 3.6629193285453887}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 9}, intermediate_values={}, trial_id=9)]
Let's try an additional 100 times.
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:10,303][0m Finished trial#10 resulted in value: -679.8303609251094. Current best value is -679.8303609251094 with parameters: {'x': 4.482235669344949}.[0m
[32m[I 2019-12-13 00:32:10,447][0m Finished trial#11 resulted in value: -664.7000624843927. Current best value is -679.8303609251094 with parameters: {'x': 4.482235669344949}.[0m
[32m[I 2019-12-13 00:32:10,579][0m Finished trial#12 resulted in value: -778.9261500173968. Current best value is -778.9261500173968 with parameters: {'x': 5.024746639127292}.[0m
... (Omitted) ... [32m[I 2019-12-13 00:32:22,591][0m Finished trial#107 resulted in value: -760.7787838740135. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m [32m[I 2019-12-13 00:32:22,724][0m Finished trial#108 resulted in value: -773.0113811629133. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m [32m[I 2019-12-13 00:32:22,862][0m Finished trial#109 resulted in value: -577.7178004902428. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m
Now the number of trials is
len(study.trials)
110
The parameters that minimize the objective function and the value of the objective function at that time are
study.best_params, study.best_value
({'x': 6.011542730094907}, -863.9855798856751)
The history of the objective function values can be visualized as follows:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.value for trial in study.trials])
plt.grid()
plt.show()
The parameter history can be visualized as follows:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.params['x'] for trial in study.trials])
plt.grid()
plt.show()
Let's illustrate how the parameter search was done.
%matplotlib inline
import matplotlib.pyplot as plt
plt.grid()
plt.plot([trial.params['x'] for trial in study.trials],
[trial.value for trial in study.trials],
marker='x', alpha=0.3)
plt.scatter(study.trials[0].params['x'], study.trials[0].value,
marker='>', label='start', s=100)
plt.scatter(study.trials[-1].params['x'], study.trials[-1].value,
marker='s', label='end', s=100)
plt.scatter(study.best_params['x'], study.best_value,
marker='o', label='best', s=100)
plt.xlabel('x')
plt.ylabel('y (value)')
plt.legend()
plt.show()
It can be seen that the area where the minimum value is unlikely to be expected is set to a reasonable level, and the area where the minimum value is expected to be expected is focused on.
Next, let's minimize $ f (x, y) = (x --2.5) ^ 2 + 2 (y + 2.5) ^ 2 $.
def f(x, y):
return (x - 2.5)**2 + 2 * (y + 2.5) ** 2
Definition of the function you want to minimize
def objective(trial):
x = trial.suggest_uniform('x', -10, 10)
y = trial.suggest_uniform('y', -10, 10)
return f(x, y)
Try 100 times like this
study = optuna.create_study()
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:24,001][0m Finished trial#0 resulted in value: 31.229461850588567. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
[32m[I 2019-12-13 00:32:24,131][0m Finished trial#1 resulted in value: 158.84900024337801. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
[32m[I 2019-12-13 00:32:24,252][0m Finished trial#2 resulted in value: 118.67648241872055. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
... (Omitted) ... [32m[I 2019-12-13 00:32:37,321][0m Finished trial#97 resulted in value: 24.46020780084274. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m [32m[I 2019-12-13 00:32:37,471][0m Finished trial#98 resulted in value: 15.832787347997524. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m [32m[I 2019-12-13 00:32:37,625][0m Finished trial#99 resulted in value: 0.6493005675217599. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m
Parameters that minimize the objective function and the minimum value at that time
study.best_params, study.best_value
({'x': 2.286816304129357, 'y': -2.788099360851467}, 0.2114497716311141)
Objective function value and parameter history
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()
plt.plot([trial.params['x'] for trial in study.trials], label='x')
plt.plot([trial.params['y'] for trial in study.trials], label='y')
plt.grid()
plt.legend()
plt.show()
Let's illustrate the history of parameters on a two-dimensional plane.
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.params['x'] for trial in study.trials],
[trial.params['y'] for trial in study.trials],
alpha=0.4, marker='x')
plt.scatter(study.trials[0].params['x'], study.trials[0].params['y'],
marker='>', label='start', s=100)
plt.scatter(study.trials[-1].params['x'], study.trials[-1].params['y'],
marker='s', label='end', s=100)
plt.scatter(study.best_params['x'], study.best_params['y'],
marker='o', label='best', s=100)
plt.grid()
plt.legend()
plt.show()
Again, we can see that the areas where the minimum value is unlikely to be expected are moderately selected, and the areas where the minimum value is likely to be expected are focused on.
The above is a preparatory exercise to easily understand the operation of Optuna.
From here, we will apply this to hyperparameter tuning of supervised machine learning.
Using the breast cancer datasets of the machine learning library Scikit-learn as an example, we get the explanatory variable $ X $ and the objective variable $ y $ as follows.
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target.ravel()
Divide into training data and test data.
from sklearn.model_selection import train_test_split
#To training data / test data 6:Randomly split by a ratio of 4
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)
As a comparison with Optuna, we will review the grid search. In the grid search, all combinations of given "parameter value candidates" are tried, and the one with the best performance is selected from them. Using lightGBM as an example of supervised machine learning, it looks like this:
%%time
from sklearn.model_selection import GridSearchCV
# LightGBM
import lightgbm as lgb
#Parameters for grid search
parameters = [{
'learning_rate':[0.1,0.2],
'n_estimators':[20,100,200],
'max_depth':[3,5,7,9],
'min_child_weight':[0.5,1,2],
'min_child_samples':[5,10,20],
'subsample':[0.8],
'colsample_bytree':[0.8],
'verbose':[-1],
'num_leaves':[80]
}]
#Grid search execution
classifier = GridSearchCV(lgb.LGBMClassifier(), parameters, cv=3, n_jobs=-1)
classifier.fit(X_train, y_train)
print("Accuracy score (train): ", classifier.score(X_train, y_train))
print("Accuracy score (test): ", classifier.score(X_test, y_test))
print(classifier.best_estimator_) #Best parameters
Accuracy score (train): 1.0
Accuracy score (test): 0.9517543859649122
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
importance_type='split', learning_rate=0.1, max_depth=3,
min_child_samples=20, min_child_weight=0.5, min_split_gain=0.0,
n_estimators=100, n_jobs=-1, num_leaves=80, objective=None,
random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
verbose=-1)
CPU times: user 1.15 s, sys: 109 ms, total: 1.26 s
Wall time: 15.3 s
The features of grid search
So, I will not focus on searching for what I can expect.
LightGBM + Optuna
Now let's tune this LightGBM with Optuna instead of grid search.
import numpy as np
#Objective function
def objective(trial):
learning_rate = trial.suggest_loguniform('learning_rate', 0.1,0.2),
n_estimators, = trial.suggest_int('n_estimators', 20, 200),
max_depth, = trial.suggest_int('max_depth', 3, 9),
min_child_weight = trial.suggest_loguniform('min_child_weight', 0.5, 2),
min_child_samples, = trial.suggest_int('min_child_samples', 5, 20),
classifier = lgb.LGBMClassifier(learning_rate=learning_rate,
n_estimators=n_estimators,
max_depth=max_depth,
min_child_weight=min_child_weight,
min_child_samples=min_child_samples,
subsample=0.8, colsample_bytree=0.8,
verbose=-1, num_leaves=80)
classifier.fit(X_train, y_train)
#return classifier.score(X_train, y_train) #Optimization of correct answer rate (train)
return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1) #Likelihood optimization
I think you have a choice as to what to optimize with the above function. The lightGBM used is a learner that does "classification" instead of "regression".
Using classifier.score (X_train, y_train)
will optimize the correct answer rate (train). The correct answer rate in classification is the number of how many of the data were correctly classified, so it looks like a continuous value and is actually a number close to a discrete value. For example, if 8 out of 10 are correctly classified, the correct answer rate (train) does not change regardless of whether the classification is "correct answer with margin" or "just right answer". In other words, it is difficult for the power to move from the "barely correct answer" to the "marginal correct answer".
This can be avoided by using np.linalg.norm (y_train --classifier.predict_proba (X_train) [:, 1], ord = 1)
. In this, y_train
is a teacher set of whether the classification result is 0 or 1, andclassifier.predict_proba (X_train) [:, 1]
is its own strength (corresponding to probability) that the prediction is 1. )is. By minimizing the L1 norm (I think the L2 norm is fine) of the difference between these two values, the power to bring the "incorrect answer" closer to the "correct answer" and the "barely correct answer" to the "marginal correct answer" Make it easier to work.
Now let's start learning. If you use classifier.score (X_train, y_train)
, choose to maximize it. If you use np.linalg.norm (y_train --classifier.predict_proba (X_train) [:, 1], ord = 1)
, choose its minimization.
#study = optuna.create_study(direction='maximize') #Maximize
study = optuna.create_study(direction='minimize') #Minimize
Try 100 times like this
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:54,913][0m Finished trial#0 resulted in value: 1.5655193925176527. Current best value is 1.5655193925176527 with parameters: {'learning_rate': 0.11563458547060446, 'n_estimators': 155, 'max_depth': 7, 'min_child_weight': 0.7324812463494225, 'min_child_samples': 12}.[0m
[32m[I 2019-12-13 00:32:55,103][0m Finished trial#1 resulted in value: 1.3810988452320123. Current best value is 1.3810988452320123 with parameters: {'learning_rate': 0.15351688726053717, 'n_estimators': 83, 'max_depth': 6, 'min_child_weight': 0.5802652538400225, 'min_child_samples': 8}.[0m
[32m[I 2019-12-13 00:32:55,287][0m Finished trial#2 resulted in value: 3.519787362063691. Current best value is 1.3810988452320123 with parameters: {'learning_rate': 0.15351688726053717, 'n_estimators': 83, 'max_depth': 6, 'min_child_weight': 0.5802652538400225, 'min_child_samples': 8}.[0m
... (Omitted) ... [32m[I 2019-12-13 00:33:17,608][0m Finished trial#97 resulted in value: 1.0443245090791662. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m [32m[I 2019-12-13 00:33:17,871][0m Finished trial#98 resulted in value: 1.3997762969822483. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m [32m[I 2019-12-13 00:33:18,187][0m Finished trial#99 resulted in value: 1.1059309199723422. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m
The parameters that optimize the objective function are
study.best_params
{'learning_rate': 0.11851649444429455,
'max_depth': 9,
'min_child_samples': 8,
'min_child_weight': 0.50006741615294,
'n_estimators': 176}
It seems unlikely that grid search will ask for such a bad value.
And the optimum value at that time is
study.best_value
1.0230542364962214
The obtained optimal parameters can be substituted using ** study.best_params
to create a tuned classifier.
classifier = lgb.LGBMClassifier(**study.best_params,
subsample=0.8, colsample_bytree=0.8,
verbose=-1, num_leaves=80)
classifier
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
importance_type='split', learning_rate=0.11851649444429455,
max_depth=9, min_child_samples=8,
min_child_weight=0.50006741615294, min_split_gain=0.0,
n_estimators=176, n_jobs=-1, num_leaves=80, objective=None,
random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
verbose=-1)
Learn with the best classifier
classifier.fit(X_train, y_train)
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
importance_type='split', learning_rate=0.11851649444429455,
max_depth=9, min_child_samples=8,
min_child_weight=0.50006741615294, min_split_gain=0.0,
n_estimators=176, n_jobs=-1, num_leaves=80, objective=None,
random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
verbose=-1)
And prediction
classifier.score(X_train, y_train)
1.0
classifier.score(X_test, y_test)
0.9473684210526315
History of objective function values
plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()
Parameter search history
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
It is like that.
scikit-learn/MLP + Optuna
Similarly, let's tune the scikit-learn multi-layer perceptron. First of all, from the grid search.
%%time
from sklearn.model_selection import GridSearchCV
#Multilayer perceptron
from sklearn.neural_network import MLPClassifier
#Parameters for grid search
parameters = [{'hidden_layer_sizes': [8, 16, 32, (8, 8), (8, 8, 8)],
'solver': ['adam'], 'activation': ['relu'],
'learning_rate_init': [0.1, 0.01, 0.001]}]
#Grid search execution
classifier = GridSearchCV(MLPClassifier(max_iter=10000, early_stopping=True),
parameters, cv=3, n_jobs=-1)
classifier.fit(X_train, y_train)
print("Accuracy score (train): ", classifier.score(X_train, y_train))
print("Accuracy score (test): ", classifier.score(X_test, y_test))
print(classifier.best_estimator_) #Classifier with the best parameters
Accuracy score (train): 0.9090909090909091
Accuracy score (test): 0.8728070175438597
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=True, epsilon=1e-08,
hidden_layer_sizes=32, learning_rate='constant',
learning_rate_init=0.1, max_iter=10000, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=None, shuffle=True, solver='adam', tol=0.0001,
validation_fraction=0.1, verbose=False, warm_start=False)
CPU times: user 166 ms, sys: 30.3 ms, total: 196 ms
Wall time: 3.07 s
With Multilayer Perceptron
Because there is a factor called, it is a little difficult.
#Objective function
def objective(trial):
hidden_layer_sizes, = trial.suggest_int('hidden_layer_sizes', 8, 100),
learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
classifier = MLPClassifier(max_iter=10000, early_stopping=True,
hidden_layer_sizes=hidden_layer_sizes,
learning_rate_init=learning_rate_init,
solver='adam', activation='relu')
classifier.fit(X_train, y_train)
#return classifier.score(X_train, y_train)
#return classifier.score(X_test, y_test)
return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:33:23,314][0m Finished trial#0 resulted in value: 33.67375867333378. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
[32m[I 2019-12-13 00:33:23,538][0m Finished trial#1 resulted in value: 35.17385235930611. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
[32m[I 2019-12-13 00:33:23,716][0m Finished trial#2 resulted in value: 52.815452458627675. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
... (Omitted) ... [32m[I 2019-12-13 00:33:47,631][0m Finished trial#97 resulted in value: 150.15953891394736. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m [32m[I 2019-12-13 00:33:47,894][0m Finished trial#98 resulted in value: 32.56506872305802. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m [32m[I 2019-12-13 00:33:48,172][0m Finished trial#99 resulted in value: 38.57363524502563. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m
study.best_params
{'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}
study.best_value
23.844866313445344
classifier = MLPClassifier(**study.best_params)
classifier.fit(X_train, y_train)
classifier.score(X_train, y_train), classifier.score(X_test, y_test)
(0.9472140762463344, 0.9122807017543859)
plt.plot([trial.value for trial in study.trials], label='score')
plt.grid()
plt.legend()
plt.show()
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
#Objective function
def objective(trial):
h1, = trial.suggest_int('h1', 8, 100),
h2, = trial.suggest_int('h2', 8, 100),
learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
classifier = MLPClassifier(max_iter=10000, early_stopping=True,
hidden_layer_sizes=(h1, h2),
learning_rate_init=learning_rate_init,
solver='adam', activation='relu')
classifier.fit(X_train, y_train)
#return classifier.score(X_train, y_train)
#return classifier.score(X_test, y_test)
return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:33:49,555][0m Finished trial#0 resulted in value: 44.26353774856942. Current best value is 44.26353774856942 with parameters: {'h1': 15, 'h2': 99, 'learning_rate_init': 0.003018305556292618}.[0m
[32m[I 2019-12-13 00:33:49,851][0m Finished trial#1 resulted in value: 29.450960862380153. Current best value is 29.450960862380153 with parameters: {'h1': 81, 'h2': 79, 'learning_rate_init': 0.01344672244443261}.[0m
[32m[I 2019-12-13 00:33:50,073][0m Finished trial#2 resulted in value: 38.96850500173973. Current best value is 29.450960862380153 with parameters: {'h1': 81, 'h2': 79, 'learning_rate_init': 0.01344672244443261}.[0m
... (Omitted) ... [32m [I 2019-12-13 00: 34: 19,151] [0m Finished trial # 97 resulted in value: 34.73946747640069. Current best value is 22.729638264213385 with parameters: {'h1' : 73,'h2': 91,'learning_rate_init': 0.005367313373989512}. [0m [32m[I 2019-12-13 00:34:19,472][0m Finished trial#98 resulted in value: 38.708695477563566. Current best value is 22.729638264213385 with parameters: {'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}.[0m [32m[I 2019-12-13 00:34:19,801][0m Finished trial#99 resulted in value: 42.20352641425415. Current best value is 22.729638264213385 with parameters: {'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}.[0m
study.best_params
{'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}
study.best_value
22.729638264213385
When I try to use the best parameters in ** study.best_params
, I get the following error: I don't know an easy solution so far, so I can only think of substituting the obtained parameters in a straightforward manner.
classifier = MLPClassifier(**study.best_params)
classifier.fit(X_train, y_train)
classifier.score(X_train, y_train), classifier.score(X_test, y_test)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-ee91971a40bc> in <module>()
----> 1 classifier = MLPClassifier(**study.best_params)
2 classifier.fit(X_train, y_train)
3 classifier.score(X_train, y_train), classifier.score(X_test, y_test)
TypeError: __init__() got an unexpected keyword argument 'h1'
Various history
plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
#Objective function
def objective(trial):
h1, = trial.suggest_int('h1', 8, 100),
h2, = trial.suggest_int('h2', 8, 100),
h3, = trial.suggest_int('h3', 8, 100),
h4, = trial.suggest_int('h4', 8, 100),
h5, = trial.suggest_int('h5', 8, 100),
hidden_layer_sizes = []
n = trial.suggest_int('n', 1, 5)
for h in [h1, h2, h3, h4, h5]:
hidden_layer_sizes.append(h)
if len(hidden_layer_sizes) == n:
break
learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
classifier = MLPClassifier(max_iter=10000, early_stopping=True,
hidden_layer_sizes=hidden_layer_sizes,
learning_rate_init=learning_rate_init,
solver='adam', activation='relu')
classifier.fit(X_train, y_train)
#return classifier.score(X_train, y_train)
#return classifier.score(X_test, y_test)
return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:34:37,028][0m Finished trial#0 resulted in value: 117.6950339936551. Current best value is 117.6950339936551 with parameters: {'h1': 44, 'h2': 90, 'h3': 75, 'h4': 51, 'h5': 87, 'n': 3, 'learning_rate_init': 0.043829528929494495}.[0m
[32m[I 2019-12-13 00:34:37,247][0m Finished trial#1 resulted in value: 107.63845860162616. Current best value is 107.63845860162616 with parameters: {'h1': 16, 'h2': 51, 'h3': 13, 'h4': 36, 'h5': 27, 'n': 3, 'learning_rate_init': 0.04986625228277607}.[0m
[32m[I 2019-12-13 00:34:37,513][0m Finished trial#2 resulted in value: 198.86827020586986. Current best value is 107.63845860162616 with parameters: {'h1': 16, 'h2': 51, 'h3': 13, 'h4': 36, 'h5': 27, 'n': 3, 'learning_rate_init': 0.04986625228277607}.[0m
... (Omitted) ... [32m[I 2019-12-13 00:35:10,424][0m Finished trial#97 resulted in value: 31.485260318520005. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m [32m[I 2019-12-13 00:35:10,801][0m Finished trial#98 resulted in value: 27.752591077771235. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m [32m[I 2019-12-13 00:35:11,199][0m Finished trial#99 resulted in value: 81.29419572506973. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m
study.best_params
{'h1': 62,
'h2': 60,
'h3': 58,
'h4': 77,
'h5': 27,
'learning_rate_init': 0.011342241271350882,
'n': 1}
study.best_value
23.024826770529504
plt.plot([trial.value for trial in study.trials], label='score')
plt.grid()
plt.legend()
plt.show()
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
PyTorch + Optuna
Similar to the above, I tried multi-layer perceptron optimization with PyTorch + Optuna.
import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).float()
y_test = torch.from_numpy(y_test).float()
train = TensorDataset(X_train, y_train)
train_loader = DataLoader(train, batch_size=10, shuffle=True)
import torch
class MLPC(torch.nn.Module):
def __init__(self, n_input, n_hidden1, n_output):
super(MLPC, self).__init__()
self.l1 = torch.nn.Linear(n_input, n_hidden1)
self.l2 = torch.nn.Linear(n_hidden1, n_output)
def forward(self, x):
h1 = self.l1(x)
h2 = torch.sigmoid(h1)
h3 = self.l2(h2)
h4 = torch.sigmoid(h3)
return h4
def score(self, x, y, threshold=0.5):
accum = 0
for y_pred, y1 in zip(self.forward(x), y):
if y1 == 1:
if y_pred >= threshold:
accum += 1
else:
if y_pred < threshold:
accum += 1
return accum / len(y)
#Objective function
from torch.autograd import Variable
def objective(trial):
n_h1, = trial.suggest_int('n_hidden1', 1, 100),
lr, = trial.suggest_loguniform('lr', 0.001, 0.1),
model = MLPC(len(train[0][0]), n_h1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
#loss_history = []
n_epoch = 2000
for epoch in range(n_epoch):
total_loss = 0
for x, y in train_loader:
x = Variable(x)
y = Variable(y)
optimizer.zero_grad()
y_pred = model(x)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()
total_loss += loss.item()
#loss_history.append(total_loss)
#if (epoch +1) % (n_epoch / 10) == 0:
# print(epoch + 1, total_loss)
score_train_history.append(model.score(X_train, y_train))
score_test_history.append(model.score(X_test, y_test))
return total_loss # model.score(X_test, y_test)Doesn't learning progress?
n_trials=100
score_train_history = []
score_test_history = []
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=n_trials)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:
Using a target size (torch.Size([10])) that is different to the input size (torch.Size([10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:
Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
[32m[I 2019-12-13 00:36:09,957][0m Finished trial#0 resulted in value: 8.354197099804878. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
[32m[I 2019-12-13 00:37:02,256][0m Finished trial#1 resulted in value: 8.542565807700157. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
[32m[I 2019-12-13 00:37:54,087][0m Finished trial#2 resulted in value: 8.721126735210419. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
... (Omitted) ... [32m[I 2019-12-13 01:59:43,405][0m Finished trial#97 resulted in value: 8.414046227931976. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m [32m[I 2019-12-13 02:00:36,203][0m Finished trial#98 resulted in value: 8.469094559550285. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m [32m[I 2019-12-13 02:01:28,698][0m Finished trial#99 resulted in value: 8.296677514910698. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m
First is the failure edition. Something Warning came out. I tried to continue without worrying about it, but as the warning said, the accuracy did not improve.
study.best_params
{'lr': 0.0010109929013465883, 'n_hidden1': 82}
study.best_value
8.206612035632133
plt.plot([trial.value for trial in study.trials], label='loss')
plt.grid()
plt.legend()
plt.show()
plt.plot(score_train_history, label='score (train)')
plt.plot(score_test_history, label='score (test)')
plt.grid()
plt.legend()
plt.show()
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
What was wrong with the above "Failure"? Warning message
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:
Using a target size (torch.Size([10])) that is different to the input size (torch.Size([10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:
Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
The meaning of is that the matrix of objective variables is not well-shaped. This time,
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
I created the explanatory variable and the objective variable as in, but I need to reshape y at this time.
y = y.reshape((len(y), 1))
This was the only cause of the failure. Other than that, the accuracy has improved with exactly the same code as before. Let's compare it with the time of failure.
#Import method to split into training data and test data
from sklearn.model_selection import train_test_split
#To training data / test data 6:Randomly split by a ratio of 4
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)
import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).float()
y_test = torch.from_numpy(y_test).float()
train = TensorDataset(X_train, y_train)
train_loader = DataLoader(train, batch_size=10, shuffle=True)
import torch
class MLPC(torch.nn.Module):
def __init__(self, n_input, n_hidden1, n_output):
super(MLPC, self).__init__()
self.l1 = torch.nn.Linear(n_input, n_hidden1)
self.l2 = torch.nn.Linear(n_hidden1, n_output)
def forward(self, x):
h1 = self.l1(x)
h2 = torch.sigmoid(h1)
h3 = self.l2(h2)
h4 = torch.sigmoid(h3)
return h4
def score(self, x, y, threshold=0.5):
accum = 0
for y_pred, y1 in zip(self.forward(x), y):
if y1 == 1:
if y_pred >= threshold:
accum += 1
else:
if y_pred < threshold:
accum += 1
return accum / len(y)
#Objective function
from torch.autograd import Variable
def objective(trial):
n_h1, = trial.suggest_int('n_hidden1', 1, 100),
lr, = trial.suggest_loguniform('lr', 0.001, 0.1),
model = MLPC(len(train[0][0]), n_h1, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
#loss_history = []
n_epoch = 2000
for epoch in range(n_epoch):
total_loss = 0
for x, y in train_loader:
#if x.shape[0] == 1:
# continue
#print(x.shape, y.shape)
x = Variable(x)
y = Variable(y)
optimizer.zero_grad()
y_pred = model(x)
loss = criterion(y_pred, y)
loss.backward()
optimizer.step()
total_loss += loss.item()
#loss_history.append(total_loss)
#if (epoch +1) % (n_epoch / 10) == 0:
# print(epoch + 1, total_loss)
score_train_history.append(model.score(X_train, y_train))
score_test_history.append(model.score(X_test, y_test))
return total_loss # model.score(X_test, y_test)If you set it to, learning will not proceed?
n_trials=100
score_train_history = []
score_test_history = []
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=n_trials)
[32m[I 2019-12-13 07:58:42,273][0m Finished trial#0 resulted in value: 7.991558387875557. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
[32m[I 2019-12-13 07:59:29,221][0m Finished trial#1 resulted in value: 8.133784644305706. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
[32m[I 2019-12-13 08:00:16,849][0m Finished trial#2 resulted in value: 8.075047567486763. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
... (Omitted) ... [32m[I 2019-12-13 09:14:47,236][0m Finished trial#97 resulted in value: 8.02999284863472. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m [32m[I 2019-12-13 09:15:34,106][0m Finished trial#98 resulted in value: 5.849344417452812. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m [32m[I 2019-12-13 09:16:20,332][0m Finished trial#99 resulted in value: 8.052950218319893. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m
study.best_params
{'lr': 0.0010151912634053866, 'n_hidden1': 38}
study.best_value
2.8610200360417366
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.value for trial in study.trials], label='loss')
plt.grid()
plt.legend()
plt.show()
plt.plot(score_train_history, label='score (train)')
plt.plot(score_test_history, label='score (test)')
plt.grid()
plt.legend()
plt.show()
for key in study.trials[0].params.keys():
plt.plot([trial.params[key] for trial in study.trials], label=key)
plt.grid()
plt.legend()
plt.show()
Recommended Posts