Introduction

① https://qiita.com/yohiro/items/04984927d0b455700cd1 ② https://qiita.com/yohiro/items/5aab5d28aef57ccbb19c ③ https://qiita.com/yohiro/items/cc9bc2631c0306f813b5 ④ https://qiita.com/yohiro/items/d376f44fe66831599d0b Continued

--Reference materials: Udemy Everyone's AI course Artificial intelligence and machine learning learned from scratch with Python

scikit-learn Machine learning library used this time

Issue setting

Given the length and width of the petals and calyxes, the iris varieties are identified. 0 represents "Setosa". 1 stands for "Versicolor". 2 stands for "Virsinica".

Source code

import

from sklearn import datasets
from sklearn import svm

Loading sample data

#Reading Iris measurement data
iris = datasets.load_iris()

ʻIris` contains the following data

`iris.data`


[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
...

`iris.target`


[0 0 ... 1 1 ... 2 2]
...

Both have 150 elements. Probably, there are 50 correct answer data for each of "0: Setosa", "1: Versicolor", and "2: Virsinica".

Classification by support vector machine

#Linear vector machine
clf = svm.LinearSVC()
#Training with support vector machine
clf.fit(iris.data, iris.target)

Train a support vector machine using the svm method. The linear vector machine used this time is a model that draws a line (or a surface if it is 3D) to group a group of multiple points driven into a plane (probably any number of dimensions). In this case, there are four data to be handled: "petal length", "petal width", "calyx length", and "calyx width", so plot the correct answer data in a four-dimensional space. Do you draw an identifiable line? I think that the.

Classification

Let the clf created above read the three data, and classify them as either "0: Setosa", "1: Versicolor", or "2: Virsinica".

#Judge the variety
print(clf.predict([[5.1, 3.5, 1.4, 0.1], [6.5, 2.5, 4.4, 1.4], [5.9, 3.0, 5.2, 1.5]]))

result

There is a warning, but is it classified?

C:\Anaconda3\python.exe C:/scikit_learn/practice.py
C:\Anaconda3\lib\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)
[0 1 2]

bonus

I tried to visualize what kind of data the contents of iris.data are

from sklearn import datasets
import matplotlib.pyplot as plt

#Reading Iris measurement data
iris = datasets.load_iris()

# Setosa, Versicolour, Virginica
sepal_length = [[], [], []]
petal_length = [[], [], []]
sepal_width = [[], [], []]
petal_width = [[], [], []]

for num, data in enumerate(iris.data):
    cls = iris.target[num]
    sepal_length[cls].append(data[0])
    petal_length[cls].append(data[1])
    sepal_width[cls].append(data[2])
    petal_width[cls].append(data[3])

plt.subplot(1,2,1)
plt.scatter(sepal_length[0], petal_length[0], c="red", label="Setosa", marker="+")
plt.scatter(sepal_length[1], petal_length[1], c="blue", label="Versicolour", marker="+")
plt.scatter(sepal_length[2], petal_length[2], c="green", label="Virginica", marker="+")
plt.xlabel('sepal_length')
plt.ylabel('petal_length')
plt.legend()

plt.subplot(1,2,2)
plt.scatter(sepal_width[0], petal_width[0], c="red", label="Setosa", marker="+")
plt.scatter(sepal_width[1], petal_width[1], c="blue", label="Versicolour", marker="+")
plt.scatter(sepal_width[2], petal_width[2], c="green", label="Virginica", marker="+")
plt.xlabel('sepal_width')
plt.ylabel('petal_width')
plt.legend()

plt.show()

By drawing a line between the Setosa, Versicolour, and Virginica groups, it can be understood that the data near the Versicolour, Virginica line may be difficult to classify.

Python & Machine Learning Study Memo ⑤: Classification of irises