[PYTHON] [Translation] scikit-learn 0.18 User Guide 3.2. Tuning the hyperparameters of the estimator

Google translated http://scikit-learn.org/0.18/modules/grid_search.html [scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)


3.2. Estimator hyperparameter tuning

Hyperparameters are parameters in the estimator that are not directly trained. In scikit-learn, they are passed as arguments to the constructor of the estimator class. Typical examples are the support vector classifiers C, kernel, gamma, and LASSO ʻalpha`. It is possible and recommended to search the hyperparameter space with a score of Optimal Cross-Validation: Estimator Outcomes (http://qiita.com/nazoking@github/items/13b167283590f512d99a). Any parameters that make up the estimator can be optimized in this way. Specifically, to find the names and current values of all the parameters of a given estimator:

estimator.get_params()

The search looks like this:

--Estimator (independent variable or classifier such as sklearn.svm.SVC ()) --Parameter space --How to search for or sample candidates --Cross validation --Score function

Some models allow for special and efficient parameter retrieval strategies outlined below. GridSearchCV thoroughly combines all parameters for the specified value. RandomizedSearchCV is given from the parameter space with the specified distribution. You can sample the number of candidates. After explaining these tools, we detail best practices that can be applied to both approaches. A small subset of these parameters can have a significant impact on the predictive or computational performance of the model, and it is common for other parameters to be left at their default values. It's probably a good idea to read the references attached to the literature and read the documentation for the estimator class to better understand the expected behavior.

3.2.1. Complete grid search

The grid search provided by GridSearchCV thoroughly generates candidates from the grid of parameter values specified by the param_grid parameter. For example, the following param_grid:

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

C value of linear kernel and [1, 10, 100, 1000], two grids with RBF kernel, and cross product of C value with [1, 10, 100, 1000], gamma value [0.001, 0.0001] is there. The GridSearchCV instance implements the regular estimator API. When you "fit" into a dataset, all possible combinations of parameter values are evaluated and the best combination is preserved.

--Example: --For a calculation example of grid search for digits dataset, [Parameter estimation using grid search with mutual verification](http://scikit-learn.org/0.18/auto_examples/model_selection/grid_search_digits.html#sphx-glr- See auto-examples-model-selection-grid-search-digits-py). --Griding parameters for both text document feature extractors (n-gram count vectorizers and TF-IDF converters) and classifiers (SGD-trained linear SVM or L2 penalties) using pipeline.Pipeline -Examples to search are "[Text feature extraction and evaluation pipeline sample](http://scikit-learn.org/0.18/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model] -selection-grid-search-text-feature-extraction-py) ". --For an example of grid lookup within an iris dataset cross-validation loop, see Nested vs. Non-Nested Cross-Validation (http://scikit-learn.org/0.18/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx See -glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py). This is a best practice for assessing model performance using grid search.

3.2.2. Randomized parameter optimization

Using a grid of parameter settings is currently the most widely used method for parameter optimization, but other search methods have more favorable characteristics. RandomizedSearchCV implements a random search for each parameter. Each setting is sampled from the distribution for possible parameter values. This has two main advantages over comprehensive search.

--Budget can be selected regardless of the number of parameters and possible values. --Adding parameters that do not affect performance does not reduce efficiency.

Use a dictionary to specify how to sample parameters. This is very similar to specifying the parameters of GridSearchCV. In addition, the calculated budget, which is the number of samples sampled or the number of sampling iterations, is specified using the n_iter parameter. For each parameter, you can specify a distribution for possible values or a list of discrete choices (uniformly sampled).

{'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
  'kernel': ['rbf'], 'class_weight':['balanced', None]}

This example uses the scipy.stats module, which contains many distributions that help with sampling parameters such as ʻexpon, gamma, ʻuniform, or randint. As a general rule, you can pass a function that provides a rvs (random variate sample) method that samples the value. Calls to the rvs function should provide a random sample that is independent of the parameter values that are possible with continuous calls.

** WARNING ** Random states cannot be specified in scipy.stats prior to scipy 0.16. Instead, use a global numpy random state that can be seeded via np.random.seed or set using np.random.set_state. However, when scikit-learn 0.18 is started, the scley.model_selection module can also use scipy> = 0.16 If, sets a random state provided by the user.

For continuous parameters like C above, it is important to specify a continuous distribution to take full advantage of randomization. In this way, increasing n_iter will always result in a finer search.

--Example: -[Compare random search and grid search for hyperparameter estimation](http://scikit-learn.org/0.18/auto_examples/model_selection/randomized_search.html#sphx-glr-auto-examples-model-selection- randomized-search-py) compares the usage and efficiency of randomized and grid searches. --Reference: --Bergio, J. and Bengio, Y., Random Search for Hyperparameter Optimization, Machine Learning Research Journal (2012)

3.2.3. Parameter search tips

3.2.3.1. Specifying objective metrics

By default, the parameter search uses the estimator's score function to evaluate the parameter settings. These are [sklearn.metrics.accuracy_score] for classification (http://scikit-learn.org/0.18/modules/generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score) and for regression. There is sklearn.metrics.r2_score. Other scoring features are more suitable for some applications (for example, in imbalanced classifications, accuracy scores are often unhelpful). You can specify different scoring functions using the GridSearchCV, RandomizedSearchCV, and the scoring parameter of the special cross-validation tool described below. For more information, see Scoring Parameters: Defining Model Evaluation Rules (http://qiita.com/nazoking@github/items/958426da6448d74279c7#331-%E5%BE%97%E7%82%B9%E3% 83% 91% E3% 83% A9% E3% 83% A1% E3% 83% BC% E3% 82% BF% E3% 83% A2% E3% 83% 87% E3% 83% AB% E8% A9% See 95% E4% BE% A1% E3% 83% AB% E3% 83% BC% E3% 83% AB% E3% 81% AE% E5% AE% 9A% E7% BE% A9).

3.2.3.2. Synthetic estimator and parameter space

[Pipeline: Chain Estimate](http://qiita.com/nazoking@github/items/fdfd207b3127d6d026e0#411-%E3%83%91%E3%82%A4%E3%83%97%E3%83%A9 % E3% 82% A4% E3% 83% B3% E9% 80% A3% E9% 8E% 96% E6% 8E% A8% E5% AE% 9A) is a synthetic estimate that allows you to search the parameter space with these tools. Explains how to build a vessel.

3.2.3.3. Model selection: development and evaluation

Model selection by evaluating different parameter settings can be seen as a way to use labeled data to "train" the parameters of the grid. When evaluating the resulting model, it is important to evaluate samples that were not used during the grid search process. We recommend splitting the data into ** development sets ** (supplied to GridSearchCV instances) and ** evaluation sets ** to calculate performance metrics. This can be done using the train_test_split utility function.

3.2.3.4. Parallel

GridSearchCV and RandomizedSearchCV evaluate their parameter settings individually. If your OS supports it, you can use the keyword n_jobs = -1 to perform calculations in parallel. See Function Signatures for more information.

3.2.3.5. Robustness to disability

Some parameter settings may cause an error in fit. By default, the entire search will fail, even if the other parameter settings can be fully evaluated. Setting ʻerror_score = 0` (or = np.NaN) issues a warning and sets the score to 0 (or NaN), but the search is complete.

3.2.4. Alternative to brute force parameter search

3.2.4.1. Model-specific mutual validation

In some models, data in a range of values for a parameter can be fitted as efficiently as fitting an estimator to a single value for that parameter. You can take advantage of this feature to perform more efficient cross-validation used for model selection for this parameter. The most common parameter that follows this strategy is the parameter that encodes the strength of the regularizer. In this case, it is said to calculate the ** normalization path ** of the estimator. A list of such models can be found here:

3.2.4.2. Information standards

Some models can provide an information-theoretic closed form of optimal estimates of regularization parameters (rather than a few when using cross-validation). Below is a list of models that have benefited from Aikike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for automatic model selection.

3.2.4.3. Bag quote

When using a bagging-based ensemble method, i.e., when using sampling with substitutions to generate a new training set, some of the training sets remain unused. For each classifier in the ensemble, different parts of the training set are excluded. This remaining portion can be used to estimate generalization error without relying on a separate set of validations. This estimate is "free" because it requires no additional data and can be used for model selection. It is currently implemented in the following class:


[scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)

© 2010 --2016, scikit-learn developers (BSD license).

Recommended Posts

[Translation] scikit-learn 0.18 User Guide 3.2. Tuning the hyperparameters of the estimator
[Translation] scikit-learn 0.18 User Guide 3.1. Cross-validation: Evaluate the performance of the estimator
[Translation] scikit-learn 0.18 User Guide Table of Contents
[Translation] scikit-learn 0.18 User Guide 3.3. Model evaluation: Quantify the quality of prediction
[Translation] scikit-learn 0.18 User Guide 4.8. Convert the prediction target (y)
[Translation] scikit-learn 0.18 User Guide 2.7. Detection of novelty and outliers
[Translation] scikit-learn 0.18 User Guide 4.5. Random projection
[Translation] scikit-learn 0.18 User Guide 1.11. Ensemble method
[Translation] scikit-learn 0.18 User Guide 1.15. Isotonic regression
[Translation] scikit-learn 0.18 User Guide 4.2 Feature extraction
[Translation] scikit-learn 0.18 User Guide 1.16. Probability calibration
[Translation] scikit-learn 0.18 User Guide 1.13 Feature selection
[Translation] scikit-learn 0.18 User Guide 3.4. Model persistence
[Translation] scikit-learn 0.18 User Guide 2.8. Density estimation
[Translation] scikit-learn 0.18 User Guide 4.3. Data preprocessing
[Translation] scikit-learn 0.18 User Guide 4.4. Unsupervised dimensionality reduction
[Translation] scikit-learn 0.18 User Guide 1.4. Support Vector Machine
[Translation] scikit-learn 0.18 User Guide 4.1. Pipeline and Feature Union: Combination of estimators
[Translation] scikit-learn 0.18 User Guide 3.5. Verification curve: Plot the score to evaluate the model
[Translation] scikit-learn 0.18 User Guide 2.5. Decompose the signal in the component (matrix factorization problem)
[Translation] scikit-learn 0.18 User Guide 1.12. Multi-class algorithm and multi-label algorithm
Grid search of hyperparameters with Scikit-learn
Japanese translation of the e2fsprogs manual
[Translation] scikit-learn 0.18 Tutorial Table of Contents
Japanese translation of the man-db manual
Japanese translation of the util-linux manual
Japanese translation of the iproute2 manual
Pandas User Guide "merge, join and concatenate" (Japanese translation of official documentation)
Let's tune the model hyperparameters with scikit-learn!
Japanese translation of the LDP man-pages manual
[Translation] scikit-learn 0.18 Tutorial Choosing the Right Model
Japanese translation: PEP 20 --The Zen of Python
Japanese translation of self-study "A Beginner's Guide to Getting User Input in Python"
About the processing speed of SVM (SVC) of scikit-learn
Predict the second round of summer 2016 with scikit-learn
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
[Translation] scikit-learn 0.18 Tutorial Statistical learning tutorial for scientific data processing Unsupervised learning: Finding the representation of data