[PYTHON] Rastersuche von Hyperparametern mit Scikit-learn

Ich werde vergessen, wie man es benutzt, also mach dir eine Notiz. Eine leicht modifizierte Version des Beispiels des Scikit-learn-Dokuments und seiner Ausführungsergebnisse.

Quellcode:

`grid_search.py`


# -*- coding: utf-8 -*-

from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC

##Daten lesen
digits = datasets.load_digits()
X = digits.data
y = digits.target

##Unterteilt in Trainingsdaten und Testdaten.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

##Parameter einstellen
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

scores = ['accuracy', 'precision', 'recall']

for score in scores:
    print '\n' + '='*50
    print score
    print '='*50

    clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring=score, n_jobs=-1)
    clf.fit(X_train, y_train)

    print "\n+Beste Parameter:\n"
    print clf.best_estimator_

    print"\n+Durchschnittliche Punktzahl beim Lebenslauf mit Trainingsdaten:\n"
    for params, mean_score, all_scores in clf.grid_scores_:
        print "{:.3f} (+/- {:.3f}) for {}".format(mean_score, all_scores.std() / 2, params)

    print "\n+Identifikationsergebnis in Testdaten:\n"
    y_true, y_pred = y_test, clf.predict(X_test)
    print classification_report(y_true, y_pred)

Ergebnis:

==================================================
accuracy
==================================================

+Beste Parameter:

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.001,
  kernel=rbf, max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

+Durchschnittliche Punktzahl beim Lebenslauf mit Trainingsdaten:

0.982 (+/- 0.002) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.001}
0.954 (+/- 0.006) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
0.981 (+/- 0.003) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.001}
0.983 (+/- 0.005) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.001}
0.983 (+/- 0.005) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.0001}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 1}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 10}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 100}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 1000}

+Identifikationsergebnis in Testdaten:

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        89
          1       0.97      1.00      0.98        90
          2       0.99      0.98      0.98        92
          3       1.00      0.99      0.99        93
          4       1.00      1.00      1.00        76
          5       0.99      0.98      0.99       108
          6       0.99      1.00      0.99        89
          7       0.99      1.00      0.99        78
          8       1.00      0.98      0.99        92
          9       0.99      0.99      0.99        92

avg / total       0.99      0.99      0.99       899


==================================================
precision
==================================================

+Beste Parameter:

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.001,
  kernel=rbf, max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

+Durchschnittliche Punktzahl beim Lebenslauf mit Trainingsdaten:

0.983 (+/- 0.002) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.001}
0.959 (+/- 0.006) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
0.982 (+/- 0.003) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.001}
0.985 (+/- 0.004) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.001}
0.985 (+/- 0.004) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.0001}
0.973 (+/- 0.005) for {'kernel': 'linear', 'C': 1}
0.973 (+/- 0.005) for {'kernel': 'linear', 'C': 10}
0.973 (+/- 0.005) for {'kernel': 'linear', 'C': 100}
0.973 (+/- 0.005) for {'kernel': 'linear', 'C': 1000}

+Identifikationsergebnis in Testdaten:

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        89
          1       0.97      1.00      0.98        90
          2       0.99      0.98      0.98        92
          3       1.00      0.99      0.99        93
          4       1.00      1.00      1.00        76
          5       0.99      0.98      0.99       108
          6       0.99      1.00      0.99        89
          7       0.99      1.00      0.99        78
          8       1.00      0.98      0.99        92
          9       0.99      0.99      0.99        92

avg / total       0.99      0.99      0.99       899


==================================================
recall
==================================================

+Beste Parameter:

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.001,
  kernel=rbf, max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

+Durchschnittliche Punktzahl beim Lebenslauf mit Trainingsdaten:

0.982 (+/- 0.002) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.001}
0.954 (+/- 0.006) for {'kernel': 'rbf', 'C': 1, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
0.981 (+/- 0.003) for {'kernel': 'rbf', 'C': 10, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.001}
0.983 (+/- 0.005) for {'kernel': 'rbf', 'C': 100, 'gamma': 0.0001}
0.986 (+/- 0.002) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.001}
0.983 (+/- 0.005) for {'kernel': 'rbf', 'C': 1000, 'gamma': 0.0001}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 1}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 10}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 100}
0.971 (+/- 0.006) for {'kernel': 'linear', 'C': 1000}

+Identifikationsergebnis in Testdaten:

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        89
          1       0.97      1.00      0.98        90
          2       0.99      0.98      0.98        92
          3       1.00      0.99      0.99        93
          4       1.00      1.00      1.00        76
          5       0.99      0.98      0.99       108
          6       0.99      1.00      0.99        89
          7       0.99      1.00      0.99        78
          8       1.00      0.98      0.99        92
          9       0.99      0.99      0.99        92

avg / total       0.99      0.99      0.99       899

** Beschreibung der GridSearchCV -Parameter ** cv Anzahl der Falten

scoring Der zu optimierende Wert kann durch die Gierensuche ermittelt werden. Standardmäßig In der Klassifizierung wird "Genauigkeit", "Lernen lernen", "Genauigkeit", "Genauigkeit", " 'R2'sklearn.metrics.r2_score wird in der Regression angegeben. Darüber hinaus können beispielsweise bei der Klassifizierung „Präzision“ und „Rückruf“ angegeben werden.

Weitere Informationen hier Für Präzision, Rückruf usw. Toki no Mori Wiki

n_jobs Geben Sie hier einfach einen ganzzahligen Wert ein und dieser wird parallel berechnet. Die Anzahl der Kerne wird automatisch mit -1 eingegeben.