[PYTHON] [Translation] scikit-learn 0.18 User Guide 3.4. Model persistence

Google translated http://scikit-learn.org/0.18/modules/model_persistence.html [scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)


3.4. Model persistence

After training the scikit-learn model, a method of sustaining the model for future use without re-learning is desirable. The next section shows an example of how to persist a model with pickle. We also identify some security and maintainability issues when working with pickle serialization.

3.4.1. Persistence example

It is possible to save scikit models using Python's built-in persistence module, pickle:

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0:1])
array([0])
>>> y[0]
0

In certain cases of scikit, it may be more interesting to use the pickle replacement for joblib (joblib.dump & joblib.load). This is more efficient for scikit-learn evaluator objects (which often have large numpy arrays internally). However, there is no dumps method, so you can only save to disk.

>>> from sklearn.externals import joblib
>>> joblib.dump(clf, 'filename.pkl') 

You can later load the pickled model (perhaps in another Python process):

>>> clf = joblib.load('filename.pkl') 

** Note: ** The joblib.dump and joblib.load functions also accept objects like files instead of filenames. For more information on data persistence in Joblib, see here (https://pythonhosted.org/joblib/persistence.html).

3.4.2. Security and maintainability limits

pickle (and joblib extensions) have some maintainability and security issues. For this reason, --Don't decrypt untrusted data as it can execute malicious code when loaded. --Models saved using one version of scikit-learn may be loaded by another version of scikit-learn, but this is not fully supported and is not recommended. It should also be noted that the operations performed on such data can have different and unexpected results.

To rebuild a similar model in a future version of scikit-learn, you will need to save additional metadata along with the pickled model.

--Reference to invariant snapshots of training data --Python source code used to generate the model --scikit-learn and its dependency version --Cross-validation score obtained from training data

This makes it possible to ensure that the cross-validation score is in the same range as before. If you want to know more about these issues or find out about other possible serialization methods, Alex Gaynor's Story See -software).


[scikit-learn 0.18 User Guide 3. Model Selection and Evaluation](http://qiita.com/nazoking@github/items/267f2371757516f8c168#3-%E3%83%A2%E3%83%87%E3%83] From% AB% E3% 81% AE% E9% 81% B8% E6% 8A% 9E% E3% 81% A8% E8% A9% 95% E4% BE% A1)

© 2010 --2016, scikit-learn developers (BSD license).

Recommended Posts

[Translation] scikit-learn 0.18 User Guide 3.4. Model persistence
[Translation] scikit-learn 0.18 User Guide 4.5. Random projection
[Translation] scikit-learn 0.18 User Guide 1.11. Ensemble method
[Translation] scikit-learn 0.18 User Guide 4.2 Feature extraction
[Translation] scikit-learn 0.18 User Guide 1.16. Probability calibration
[Translation] scikit-learn 0.18 User Guide 1.13 Feature selection
[Translation] scikit-learn 0.18 User Guide 2.8. Density estimation
[Translation] scikit-learn 0.18 User Guide 4.3. Data preprocessing
[Translation] scikit-learn 0.18 User Guide 4.4. Unsupervised dimensionality reduction
[Translation] scikit-learn 0.18 User Guide Table of Contents
[Translation] scikit-learn 0.18 User Guide 1.4. Support Vector Machine
[Translation] scikit-learn 0.18 User Guide 3.3. Model evaluation: Quantify the quality of prediction
[Translation] scikit-learn 0.18 User Guide 3.5. Verification curve: Plot the score to evaluate the model
[Translation] scikit-learn 0.18 User Guide 3.2. Tuning the hyperparameters of the estimator
[Translation] scikit-learn 0.18 User Guide 2.7. Detection of novelty and outliers
[Translation] scikit-learn 0.18 User Guide 3.1. Cross-validation: Evaluate the performance of the estimator
[Translation] scikit-learn 0.18 User Guide 4.1. Pipeline and Feature Union: Combination of estimators
[Translation] scikit-learn 0.18 Tutorial Choosing the Right Model
[Translation] scikit-learn 0.18 User Guide 2.5. Decompose the signal in the component (matrix factorization problem)
Pandas User Guide "Multi-Index / Advanced Index" (Official document Japanese translation)
Pandas User Guide "Manipulating Missing Data" (Official Document Japanese Translation)
TensorFlow Tutorial-Sequence Transformation Model (Translation)