[PYTHON] I tried cross-validation based on the grid search results with scikit-learn

Introduction

Grid search and cross-validation are introduced in various places, but since there was no place that introduced the method of cross-validation based on the result of grid search, I will introduce it here.

environment

Contents

We will introduce how to perform cross-validation based on the results of grid search.

Implementation

Get dataset

First, get the data for machine learning. Since scikit-learn has a dataset prepared in advance, it is a good place to start machine learning for the time being. The details of the dataset are summarized in [this site](the dataset that comes with http://pythondatascience.plavox.info/scikit-learn/scikit-learn/).

data set


#Get dataset
iris = datasets.load_iris()

Grid search

Then do a grid search. Set the parameters for grid search and execute. One of the attractions of scikit-learn is that you can easily search the grid. It's cool to call parameters hyperparameters ♪

Grid search


#Set parameters for grid search
parameters = {
    'C':[1, 3, 5],
    'loss':('hinge', 'squared_hinge')
}

#Perform grid search
clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
clf.fit(iris.data, iris.target)

#Grid search results(Optimal parameters)Get
GS_loss, GS_C = clf.best_params_.values()
print "Optimal parameters:{}".format(clf.best_params_)

The optimum parameters are assigned to'GS_loss' and'GS_C', respectively. Before getting the optimum parameters, it is better to display them once and check the order of the parameters. The order of the parameters does not seem to be the order of the parameters on the Official site (sklearn.svm.LinearSVC) ...

Cross-validation

Cross-validation is performed based on the result of the grid search at the end.

Cross-validation


#Cross-validation(Cross-validation)Run
clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)

#Display cross-validation results
print "Correct answer rate(average):{}".format(score.mean())
print "Correct answer rate(minimum):{}".format(score.min())
print "Correct answer rate(maximum):{}".format(score.max())
print "Correct answer rate(standard deviation):{}".format(score.std())
print "Correct answer rate(all):{}".format(score)

Whole code

The entire


# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import svm
from sklearn import grid_search
from sklearn import cross_validation

# main
if __name__ == "__main__":
    #Get dataset
    iris = datasets.load_iris()

    #Set parameters for grid search
    parameters = {
        'C':[1, 3, 5],
        'loss':('hinge', 'squared_hinge')
    }

    #Perform grid search
    clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
    clf.fit(iris.data, iris.target)

    #Grid search results(Optimal parameters)Get
    GS_loss, GS_C = clf.best_params_.values()
    print "Optimal parameters:{}".format(clf.best_params_)

    #Cross-validation(Cross-validation)Run
    clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
    score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)

    #Display cross-validation results
    print "Correct answer rate(average):{}".format(score.mean())
    print "Correct answer rate(minimum):{}".format(score.min())
    print "Correct answer rate(maximum):{}".format(score.max())
    print "Correct answer rate(standard deviation):{}".format(score.std())
    print "Correct answer rate(all):{}".format(score)

Execution result

Execution result


Optimal parameters:{'loss': 'squared_hinge', 'C': 1}
Correct answer rate(average):0.966666666667
Correct answer rate(minimum):0.9
Correct answer rate(maximum):1.0
Correct answer rate(standard deviation):0.0421637021356
Correct answer rate(all):[ 1.          1.          0.93333333  0.9         1.        ]

Summary

It was a bit disappointing that the grid search results were the same as the LinearSVC () defaults, but for the time being I was able to cross-validate using the grid search results. I'm allergic to English, so I had a hard time learning while looking at the official website.

reference

[Dataset included with scikit-learn](http://pythondatascience.plavox.info/scikit-learn/Dataset included with scikit-learn /)

Official site (sklearn.svm.LinearSVC)

Introduction of python machine learning library scikit-learn

Recommended Posts

I tried cross-validation based on the grid search results with scikit-learn
I tried playing with the calculator on tkinter
Grid search of hyperparameters with Scikit-learn
I tried to get started with Bitcoin Systre on the weekend
[Scikit-learn] I played with the ROC curve
I tried playing with the image with Pillow
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried handwriting recognition of runes with scikit-learn
I tried "smoothing" the image with Python + OpenCV
I tried "differentiating" the image with Python + OpenCV
I tried to save the data with discord
I tried "binarizing" the image with Python + OpenCV
Save the search results on Twitter to CSV.
I tried object detection with YOLO v3 (TensorFlow 2.1) on the GPU of windows!
I tried to learn the sin function with chainer
I tried to implement Minesweeper on terminal with python
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to understand how to use Pandas and multicollinearity based on the Affairs dataset.
I tried python on heroku for the first time
I tried to solve the problem with Python Vol.1
I tried installing the Linux kernel on virtualbox + vagrant
I tried to notify the honeypot report on LINE
I tried hitting the API with echonest's python client
I don't tweet, but I want to use tweepy: just display the search results on the console
[Shell startup] I tried to display the shell on the TV with a cheap Linux board G-cluster
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️
Create a new csv with pandas based on the local csv
I tried using the Python library from Ruby with PyCall
I tried to find the average of the sequence with TensorFlow
Visualize the results of decision trees performed with Python scikit-learn
I tried to notify the train delay information with LINE Notify
I tried running PIFuHD on Windows for the time being
I tried replacing the Windows 10 HDD with a smaller SSD
Data analysis based on the election results of the Tokyo Governor's election (2020)
I tried using the DS18B20 temperature sensor with Raspberry Pi
I tried saving the DRF API request history with django-request
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to launch ipython cluster to the minimum on AWS
I tried to divide the file into folders with Python
I tried fp-growth with python
I tried scraping with Python
I tried Learning-to-Rank with Elasticsearch!
I tried clustering with PyCaret
I tried the changefinder library!
I can't search with # google-map. ..
I tried gRPC with Python
I tried scraping with python
I tried MLflow on Databricks
When I tried to generate sentences based on Kafka's "transformation" with LSTM, I could not become anyone.
I tried using "Asciichart Py" which can draw a beautiful graph on the console with Python.
I tried using PDF data of online medical care based on the spread of the new coronavirus infection
Life game with Python [I made it] (on the terminal & Tkinter)
I tried scraping the ranking of Qiita Advent Calendar with Python
[AWS / Tello] I tried operating the drone with my voice Part2
I tried to describe the traffic in real time with WebSocket
I tried to solve the ant book beginner's edition with python
I tried to automate the watering of the planter with Raspberry Pi