Python & Machine Learning Study Memo ⑥: Number Recognition

Introduction

① https://qiita.com/yohiro/items/04984927d0b455700cd1 ② https://qiita.com/yohiro/items/5aab5d28aef57ccbb19c ③ https://qiita.com/yohiro/items/cc9bc2631c0306f813b5 ④ https://qiita.com/yohiro/items/d376f44fe66831599d0b ⑤ https://qiita.com/yohiro/items/3abaf7b610fbcaa01b9c Continued

--Reference materials: Udemy Everyone's AI course Artificial intelligence and machine learning learned from scratch with Python --Library used: scikit-learn

Issue setting

Recognize the written numbers from the handwritten number image (8 x 8 px).

Source code

import

from sklearn import datasets
from sklearn import svm
from sklearn import metrics
import matplotlib.pyplot as plt

Loading sample data

#Reading numeric data
digits = datasets.load_digits()

The digits contains the following data.

digits.data


[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]

digits.target


[0 1 2 ... 8 9 8]

digits.data is a 64x1797 list, the element values represent colors in grayscale, and one 64 element list represents one image. For image display, digits.image also contains the same information although the list format is different. digits.target shows the correct answer (= which number is represented) of each image.

Training with support vector machine

#Support vector machine
clf = svm.SVC(gamma=0.001, C=100.0) # gamma:The magnitude of the impact of one training data, C:False recognition tolerance
#Training with support vector machine(60% of data is used, the remaining 40% is for verification)
clf.fit(digits.data[:int(n*6/10)], digits.target[:int(n*6/10)])

The last time I used it was LinearSVC (), but this time I'm using SVC (). Is it not possible to classify by linear boundaries?

Classification

Have the clf created above read the remaining 40% of the data in digits.data and classify each number.

#Correct answer
expected = digits.target[int(-n*4/10):]
#Forecast
predicted = clf.predict(digits.data[int(-n*4/10):])
#Correct answer rate
print(metrics.classification_report(expected, predicted))
#Misrecognition matrix
print(metrics.confusion_matrix(expected, predicted))

result

Correct answer rate

              precision    recall  f1-score   support

           0       0.99      0.99      0.99        70
           1       0.99      0.96      0.97        73
           2       0.99      0.97      0.98        71
           3       0.97      0.86      0.91        74
           4       0.99      0.96      0.97        74
           5       0.95      0.99      0.97        71
           6       0.99      0.99      0.99        74
           7       0.96      1.00      0.98        72
           8       0.92      1.00      0.96        68
           9       0.96      0.97      0.97        71

    accuracy                           0.97       718
   macro avg       0.97      0.97      0.97       718
weighted avg       0.97      0.97      0.97       718

99% of the answers predicted to be 0 are correct, and 99% of the correct answers were predicted to be 0. Reference of how to read the table: -How to read classification_report -Be careful when putting F1 score in metarcs with Keras

Misrecognition matrix

[[69  0  0  0  1  0  0  0  0  0]
 [ 0 70  1  0  0  0  0  0  2  0]
 [ 1  0 69  1  0  0  0  0  0  0]
 [ 0  0  0 64  0  3  0  3  4  0]
 [ 0  0  0  0 71  0  0  0  0  3]
 [ 0  0  0  0  0 70  1  0  0  0]
 [ 0  1  0  0  0  0 73  0  0  0]
 [ 0  0  0  0  0  0  0 72  0  0]
 [ 0  0  0  0  0  0  0  0 68  0]
 [ 0  0  0  1  0  1  0  0  0 69]]

Of the 0 images, 69 are recognized as 0, 1 is recognized as 4, and so on.

Actual image and predicted value

#Correspondence between prediction and image (part)
images = digits.images[int(-n*4/10):]
for i in range(12):
    plt.subplot(3, 4, i + 1)
    plt.axis("off")
    plt.imshow(images[i], cmap=plt.cm.gray_r, interpolation="nearest")
    plt.title("Guess: " + str(predicted[i]))
plt.show()

pred.png

You can see that the numbers can be recognized.

bonus

I tried to visualize digits.data (black and white binary image)

for i in range(10):
    my_s = ""
    for k, j in enumerate(digits.data[i]):
        if (j > 0):
            my_s += " ■ "
        else:
            my_s += "   "
        if k % 8 == 7:
            print(my_s)
            my_s = ""
    print("\n")

result

       ■  ■  ■  ■       
       ■  ■  ■  ■  ■    
    ■  ■  ■     ■  ■    
    ■  ■        ■  ■    
    ■  ■        ■  ■    
    ■  ■     ■  ■  ■    
    ■  ■  ■  ■  ■       
       ■  ■  ■          


          ■  ■  ■       
          ■  ■  ■       
       ■  ■  ■  ■       
    ■  ■  ■  ■  ■       
       ■  ■  ■  ■       
       ■  ■  ■  ■       
       ■  ■  ■  ■       
          ■  ■  ■

...

       ■  ■  ■  ■       
       ■  ■  ■  ■       
       ■  ■     ■  ■    
       ■  ■  ■  ■  ■    
       ■  ■  ■  ■       
    ■  ■  ■  ■  ■  ■    
    ■  ■  ■  ■  ■  ■    
       ■  ■  ■  ■  ■    


       ■  ■             
    ■  ■  ■  ■  ■       
    ■  ■  ■  ■  ■       
    ■  ■  ■  ■  ■       
       ■  ■  ■  ■  ■    
          ■     ■  ■    
             ■  ■  ■    
       ■  ■  ■  ■   

You can see that it is handwritten somehow

Recommended Posts

Python & Machine Learning Study Memo ⑥: Number Recognition
Python & Machine Learning Study Memo: Environment Preparation
Python & Machine Learning Study Memo ③: Neural Network
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Python & Machine Learning Study Memo ⑤: Classification of irises
Python & Machine Learning Study Memo ②: Introduction of Library
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
"Scraping & machine learning with Python" Learning memo
I installed Python 3.5.1 to study machine learning
Python module (Python learning memo ④)
Python exception handling (Python learning memo ⑥)
Coursera Machine Learning Challenges in Python: ex3 (Handwritten Number Recognition with Logistic Regression)
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2
Machine learning with Python! Preparation
Machine Learning Study Resource Notepad
Practical machine learning system memo
Python Machine Learning Programming> Keywords
Beginning with Python machine learning
Memo for building a machine learning environment using Python
Build a python machine learning study environment on macOS sierra
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Number recognition in images with Python
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Input / output with Python (Python learning memo ⑤)
<For beginners> python library <For machine learning>
Python: Preprocessing in Machine Learning: Overview
Interval scheduling learning memo ~ by python ~
Study, number guessing game in Python
Python memo
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
python memo
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Machine learning
python memo
Python memo
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Python memo
Python memo
Notes on PyQ machine learning python grammar
Python numbers, strings, list types (Python learning memo ①)
[Learning memo] Basics of class by python
Use machine learning APIs A3RT from Python
Machine learning with python (2) Simple regression analysis
Why Python is chosen for machine learning
Python: Preprocessing in machine learning: Data acquisition
[Shakyo] Encounter with Python for machine learning
Python data structure and operation (Python learning memo ③)
[Python] First data analysis / machine learning (Kaggle)
[Python] When an amateur starts machine learning
[Python] Web application design for machine learning
[Online] Everyone's Python study session # 55 Rough memo
Python and machine learning environment construction (macOS)
Python standard library: second half (Python learning memo ⑨)
An introduction to Python for machine learning
[Python] Saving learning results (models) in machine learning
Python: Preprocessing in machine learning: Data conversion
Study memo 1_eclipse (python) download, python execution, OpenPyXL download
Build AI / machine learning environment with Python
Python standard library: First half (Python learning memo ⑧)