[PYTHON] Solving the iris problem with scikit-learn ver1.0 (logistic regression)

1. 1. Introduction

As a tutorial for learning machine learning, I will record as a reminder the method I used to predict the name of iris, which is a must-have path for everyone.

The version used is here.

2. What is the classification of irises?

2-1 Outline of iris problem

There are three varieties of iris called "setosa", "versicolor" and "virginica". The data representing the corolla (the entire iris) of this iris includes the width and length of the sepals (Sepal) and petals (Petal). Deriving the names of three kinds of flowers from these four characteristics is the problem this time.

2-2 About the program

Import of libraries etc.


import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

%matplotlib inline

from sklearn.datasets import load_iris

This time, we are loading numpy, pandas, matplotlib, seaborn, and sklearn. The iris dataset was read from within sklearn.datasets.

Take a look at the data


iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_data

002.png

There were 150 data. Also, the width and length of sepals and petals are listed, probably in cm.

Next, let's look at the types of flowers.


iris_target = DataFrame(y, columns =['Species'])
iris_target

003.png

You can see that the type is already assigned as a numerical value, not as the name of the flower. It is OK to process as it is, but it will be troublesome such as having to remember the correspondence between the numerical value and the name by yourself, so let's correspond to the name.


#Define a function to name
def flower(num):
    if num ==0:
        return 'Setosa'
    elif num == 1:
        return 'Veriscolour'
    else:
        return 'Virginica'
iris_target['Species'] = iris_target['Species'].apply(flower)
iris_target

004.png

Now that the name is specified, it's easier to understand.

Check the correlation for each variable


iris = pd.concat
([iris_data, iris_target], axis=1)
sns.pairplot(iris, hue='Species',hue_order=['Virginica', 'Veriscolour', 'Setosa'], size=2,palette="husl")

005.png

Plot the correlation for each variable. It can be described in one line by using seaborn's pairplot method. Looking at it this way, you can see that Setosa has a distinctive difference compared to the other two. On the other hand, Virginica and Veriscolour are located where the Sepal Length is similar, and it seems difficult to separate them by this alone.

If you look at the actual flowers, you can see that the flowers that are small overall are Setosa.

007.png

2-3 Prediction using logistic regression


#Import Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
logreg = LogisticRegression()

#We decided to use 30% of the test data.
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)
logreg.fit(x_train, y_train)

#Correct answer rate(accuracy_function to get score)
from sklearn import metrics
y_pred  =logreg.predict(x_test)
metrics.accuracy_score(y_test, y_pred)

Correct answer rate: 0.9777777777777777

This time, we analyzed using logistic regression. Logistic regression is a regression whose objective variable is a binary value of 0 or 1. In other words, it is a means to determine whether it is "genuine" or "fake", "benign" or "malignant".

In this case, we applied the method of dividing into three. It is possible to apply logistic regression to multiple classes of 3 or more. As for the image of its application, even if it is multivariable as shown in the image below, it is calculated separately as two variables.   010.png

In this case, the correct answer rate was 97.8%. You can see that this method looks good.

Reference URL

https://dev.classmethod.jp/machine-learning/logistic-regression-impl/ http://www.msi.co.jp/nuopt/docs/v20/examples/html/02-18-00.html

3. 3. Full program


import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

%matplotlib inline

from sklearn.datasets import load_iris
iris = load_iris()
x =iris.data
y=iris.target

iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_target = DataFrame(y, columns =['Species'])

def flower(num):
    if num ==0:
        return 'Setosa'
    elif num == 1:
        return 'Veriscolour'
    else:
        return 'Virginica'

iris_target['Species'] = iris_target['Species'].apply(flower)

iris = pd.concat([iris_data, iris_target], axis=1)

#Import logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

logreg = LogisticRegression()
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)

logreg.fit(x_train, y_train)

from sklearn import metrics
y_pred  =logreg.predict(x_test)


metrics.accuracy_score(y_test, y_pred)

Recommended Posts

Solving the iris problem with scikit-learn ver1.0 (logistic regression)
Solving the paraboloid minimization problem with OpenMDAO
Solving the N Queen problem with combinatorial optimization
Solving the N Queens problem with combinatorial optimization
Solving the Python knapsack problem with the greedy algorithm
python chrome driver ver. Solving the problem of difference
[Python] Linear regression with scikit-learn
Implementing logistic regression with NumPy
Robust linear regression with scikit-learn
Solving the nurse scheduling problem (shift optimization) with a genetic algorithm
Logistic regression analysis Self-made with python
The most basic clustering analysis with scikit-learn
Let's tune the model hyperparameters with scikit-learn!
[Logistic regression] Implement k-validation with stats models
[Scikit-learn] I played with the ROC curve
Logistic regression
Logistic regression
Solving the traveling salesman problem with the genetic algorithm (GA) and its library (vcopt)
Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron
Try to solve the fizzbuzz problem with Keras
Try Theano with Kaggle's MNIST Data ~ Logistic Regression ~
Predict the second round of summer 2016 with scikit-learn
Solving the Lorenz 96 model with Julia and Python
Implement a discrete-time logistic regression model with stan
Multivariable regression model with scikit-learn --SVR comparison verification
Logistic regression implementation with particle swarm optimization method
[Python] Determine the type of iris with SVM
[Logistic regression] Implement holdout verification with stats models