As a tutorial for learning machine learning, I will record as a reminder the method I used to predict the name of iris, which is a must-have path for everyone.
The version used is here.
There are three varieties of iris called "setosa", "versicolor" and "virginica". The data representing the corolla (the entire iris) of this iris includes the width and length of the sepals (Sepal) and petals (Petal). Deriving the names of three kinds of flowers from these four characteristics is the problem this time.
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
from sklearn.datasets import load_iris
This time, we are loading numpy, pandas, matplotlib, seaborn, and sklearn. The iris dataset was read from within sklearn.datasets.
iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_data
There were 150 data. Also, the width and length of sepals and petals are listed, probably in cm.
Next, let's look at the types of flowers.
iris_target = DataFrame(y, columns =['Species'])
iris_target
You can see that the type is already assigned as a numerical value, not as the name of the flower. It is OK to process as it is, but it will be troublesome such as having to remember the correspondence between the numerical value and the name by yourself, so let's correspond to the name.
#Define a function to name
def flower(num):
if num ==0:
return 'Setosa'
elif num == 1:
return 'Veriscolour'
else:
return 'Virginica'
iris_target['Species'] = iris_target['Species'].apply(flower)
iris_target
Now that the name is specified, it's easier to understand.
iris = pd.concat
([iris_data, iris_target], axis=1)
sns.pairplot(iris, hue='Species',hue_order=['Virginica', 'Veriscolour', 'Setosa'], size=2,palette="husl")
Plot the correlation for each variable. It can be described in one line by using seaborn's pairplot method. Looking at it this way, you can see that Setosa has a distinctive difference compared to the other two. On the other hand, Virginica and Veriscolour are located where the Sepal Length is similar, and it seems difficult to separate them by this alone.
If you look at the actual flowers, you can see that the flowers that are small overall are Setosa.
#Import Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
logreg = LogisticRegression()
#We decided to use 30% of the test data.
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)
logreg.fit(x_train, y_train)
#Correct answer rate(accuracy_function to get score)
from sklearn import metrics
y_pred =logreg.predict(x_test)
metrics.accuracy_score(y_test, y_pred)
Correct answer rate: 0.9777777777777777
This time, we analyzed using logistic regression. Logistic regression is a regression whose objective variable is a binary value of 0 or 1. In other words, it is a means to determine whether it is "genuine" or "fake", "benign" or "malignant".
In this case, we applied the method of dividing into three. It is possible to apply logistic regression to multiple classes of 3 or more. As for the image of its application, even if it is multivariable as shown in the image below, it is calculated separately as two variables.
In this case, the correct answer rate was 97.8%. You can see that this method looks good.
Reference URL
https://dev.classmethod.jp/machine-learning/logistic-regression-impl/ http://www.msi.co.jp/nuopt/docs/v20/examples/html/02-18-00.html
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
from sklearn.datasets import load_iris
iris = load_iris()
x =iris.data
y=iris.target
iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_target = DataFrame(y, columns =['Species'])
def flower(num):
if num ==0:
return 'Setosa'
elif num == 1:
return 'Veriscolour'
else:
return 'Virginica'
iris_target['Species'] = iris_target['Species'].apply(flower)
iris = pd.concat([iris_data, iris_target], axis=1)
#Import logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
logreg = LogisticRegression()
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)
logreg.fit(x_train, y_train)
from sklearn import metrics
y_pred =logreg.predict(x_test)
metrics.accuracy_score(y_test, y_pred)
Recommended Posts