1. Overview

This time, we used a type of machine learning, Support Vector Machine (SVM), to classify apples and pears. We investigated whether it is possible to classify apples and pears by applying two components to SVM using the pixel values (mean values) of 10 images each. By the way, SVM is a method to determine the boundary by picking up some data points near the boundary instead of all the data points. The data points near that boundary are called support vectors.

2. Data to prepare

Obtain the average RGB pixel value of apples and pears in advance. For reference, I used apple.csv below for the csv file of the average pixel value of apples. Since opencv is used, the order is BGR. There are only apples, and the pixel value of Red is high. In addition to this csv file, create a pear csv file in advance.

`apple.csv`


,blue,green,red
0,39.88469583593901,28.743374377331637,137.23369201906283
1,83.72563703792319,79.59471228615863,164.77884914463453
2,66.8231805177587,74.52501570023027,141.8854929872305
3,55.2837418388098,45.28968211495237,148.4160869099861
4,37.59397951454073,49.82323881039423,137.30237460066527
5,53.68868757437335,50.963264366051206,142.6121454070861
6,51.277953772145956,64.07145371348116,152.98116860260473
7,50.47702848900108,48.37151099891814,124.46714749368914
8,40.35442093843233,52.0682126390019,137.8299091402224
9,48.18758094199441,55.87655919841865,145.6361529548088

3. Environment

This time in the library ・ Matplotlib ・ Numpy ・ Scikit-learn ・ Mglearn You need to install these libraries because you will be using. If you don't have the library yet, please install it with the command below.

pip install matplotlib
pip install numpy
pip install mglearn
pip install scikit-learn

4. Code

Since SVM was applied in two dimensions this time, we will observe with three patterns of BGR, BG, GR, and BR. I've added a comment to the code so please refer to it. By the way, Japanese pears are said to be called "Japanese pear" in English. "Pear" seems to be a pear. (In the code, "Japanese pear" is long and difficult to read, so I chose "pear".)

`SVM_bgr_2D.py`


import os
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
from sklearn.svm import LinearSVC
import mglearn

def main():
    path = 'output'
    os.makedirs(path, exist_ok=True)

    apple = np.loadtxt('input/apple.csv', delimiter=',', skiprows=1,usecols=[1,2,3])   #Get pixel value data from apple csv file
    pear = np.loadtxt('input/pear.csv', delimiter=',', skiprows=1,usecols=[1,2,3])
    #Get pixel value data from pear csv file

    SVM2D(np.delete(apple,2,1), np.delete(pear,2,1),'blue','green',path)    #Apply bgr bg to SVM
    SVM2D(np.delete(apple,0,1), np.delete(pear,0,1), 'green', 'red',path)    #Multiply bgr's gr into SVM
    SVM2D(np.delete(apple,1,1), np.delete(pear,1,1), 'blue', 'red',path)    #Apply bgr's br to SVM

def SVM2D(ap_pv, pe_pv, xlabel, ylabel, path):

    yap=[0]*ap_pv.shape[0]  #Create an array of 0s for the number of apple data
    ype=[1]*pe_pv.shape[0]  #Create an array of 0s for the number of pear data
    y = np.array(yap+ype)   #Array of training data classification for apples and pears
    X = np.concatenate([ap_pv,pe_pv],0)    #Array of apple and pear data (RGB pixel values)

    #Create SVM boundary diagram
    linear_svm = LinearSVC().fit(X, y)
    fig=plt.figure(figsize = (10, 6))
    ax = fig.add_subplot(1,1,1)
    ax.axis('normal')
    mglearn.plots.plot_2d_separator(linear_svm, ap_pv)
    mglearn.discrete_scatter(X[:, 0], X[:, 1], y)

    ax.legend(['apple', 'pear'])
    ax.xaxis.set_major_locator(ticker.MultipleLocator(20))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(20))
    ax.tick_params('x', labelsize =15)
    ax.tick_params('y', labelsize =15)
    ax.set_xlabel(xlabel, fontsize= 20)
    ax.set_ylabel(ylabel, fontsize= 20)
    plt.savefig(path+'/SVM_'+xlabel+'_'+ylabel+'.png')

    #Output the correct answer rate of training data
    print('score on training set: {:.2f}'.format(linear_svm.score(X,y)))

if __name__=='__main__':
    main()

5. Results and considerations

As a result of performing SVM for 3 patterns "Blue-Green", "Green-Red", and "Blue-Red", the boundaries of each are shown in the figure below.

Blue-Red cannot be classified because the data points are mixed. In comparison, looking at the Blue-Green and Green-Red diagrams, we were able to clearly separate apples and pears. From this, it was found that green information is important for distinguishing between pears and apples, not red or blue. In the RGB image, apples are based on Red, but pears are based on green and Red because they are yellowish green. So there was a clear difference in green between apples and pears.

6. References

★ Site -[[For beginners of python machine learning] Easy implementation of SVM with scikit-learn] [0] [0]:https://qiita.com/kazuki_hayakawa/items/18b7017da9a6f73eba77

・ [English for "pear" | Correct pronunciation of pear and "Japanese pear" and related English] [1] [1]:https://mysuki.jp/english-pear-7661

★ Books ・ [Machine learning starting with Python (published by O'Reilly Japan)] [2] [2]:https://www.amazon.co.jp/-/en/Andreas-C-Muller/dp/4873117984/ref=sr_1_2?adgrpid=60120324664&dchild=1&gclid=CjwKCAiAnIT9BRAmEiwANaoE1YIps5s80JJSRehiS7IYnFoTKYgr9WYubUIg1BNKCBYEdREPVB_weRoCFSsQAvD_BwE&hvadid=338518119513&hvdev=c&hvlocphy=1009247&hvnetw=g&hvqmt=e&hvrand=14766900827825353786&hvtargid=kwd-314655987025&hydadcr=27268_11561171&jp-ad-ap=0&keywords=python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82%8B%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92&qid=1604415106&sr=8-2&tag=googhydr-22

"Machine learning starting with Python" is a very useful book for those who are studying machine learning. It is very useful for me because it contains sample code for actual machine learning. The content is easier to read, but I think it's for people who know Python and the basics of machine learning and deep learning. Please take it in your hand and read it!

[Python] Sort apples and pears from pixel values using a support vector machine (SVM)