[PYTHON] Machine learning template for handwritten digit data

Python textbook to acquire practical skills

If you throw in an appropriate number image with python3 digits.py $ {fileName}, it will predict.

スクリーンショット 2017-05-27 9.36.24.png

digits.py


import os, sys, math
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, model_selection, svm, metrics
from sklearn.externals import joblib
from PIL import Image

#Model data file name
DIGITS_PKL = "digit-clf.pkl"

#Read handwritten digit data
digits = datasets.load_digits()
#Cross-validation
#Randomly divide the data into training and testing
data_train, data_test, label_train, label_test = \
    model_selection.train_test_split(digits.data, digits.target)

#Create a predictive model
def create_model():
    #Model building
    clf = svm.SVC(gamma=0.001)
    # clf = svm.LinearSVC()
    # from sklearn.ensemble import RandomForestClassifier
    # clf = RandomForestClassifier()
    #Learning
    clf.fit(data_train, label_train)
    #Save Predictive Model
    joblib.dump(clf, DIGITS_PKL)
    print("Saved the prediction model=", DIGITS_PKL)
    return clf

#Select a forecast model
def select_model():
    #Load model file
    if not os.path.exists(DIGITS_PKL):
        clf = create_model() #Generate without model
    clf = joblib.load(DIGITS_PKL)
    return clf

#Predict numbers from data
def predict_digits(data,clf):
    n = clf.predict([data])
    print("judgment result=", n)

#Convert handwritten digit images to 8x8 grayscale data array
def image_to_data(imagefile):
    image = Image.open(imagefile).convert('L') #Grayscale conversion
    image = image.resize((8, 8), Image.ANTIALIAS)
    img = np.asarray(image, dtype=float)
    img = np.floor(16 - 16 * (img / 256)) #Line example operation
    #Display the converted image
    plt.imshow(img)
    plt.gray()
    plt.show()

    img = img.flatten()
    print("img=",img)
    return img

#Evaluate the model
def evaluate_model(clf):
    predict = clf.predict(data_test)
    return predict

#Create a report from forecasts
def show_report(predict, clf):
    ac_score = metrics.accuracy_score(label_test, predict)
    cl_report = metrics.classification_report(label_test, predict)
    print('Sorter information =', clf)
    print('Correct answer rate =', ac_score)
    print('Report =', cl_report)
    # precision:accuracy, recall:Recall rate (correct answer rate),
    # f1-score:Harmonic mean of accuracy and recall, support:Number of data on the correct label

def main():
    #Get command line arguments
    if len(sys.argv) <= 1:
        print("USAGE:")
        print("python3 predict_digit.py imagefile")
        return
    imagefile = sys.argv[1]
    data = image_to_data(imagefile)
    clf = select_model();
    predict_digits(data,clf)
    show_report(evaluate_model(clf),clf)

if __name__ == '__main__':
    main()

result


img= [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  9.  7.  7.  7.  7.  2.  0.  1.  8.
  0.  1.  0.  0.  0.  0.  1.  6.  0.  0.  0.  0.  0.  0.  1.  9.  5.  6.
  5.  1.  0.  0.  0.  4.  3.  3.  4.  8.  1.  0.  0.  0.  0.  0.  2.  9.
  2.  0.  0.  3.  8.  8.  8.  2.  0.  0.]
judgment result= [5]
Sorter information = SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
Correct answer rate = 0.993333333333
Report = precision recall f1-score   support

          0       1.00      1.00      1.00        38
          1       1.00      1.00      1.00        48
          2       1.00      1.00      1.00        40
          3       0.98      0.98      0.98        47
          4       1.00      1.00      1.00        54
          5       0.98      0.98      0.98        47
          6       0.98      1.00      0.99        46
          7       1.00      1.00      1.00        42
          8       1.00      1.00      1.00        47
          9       1.00      0.98      0.99        41

avg / total       0.99      0.99      0.99       450

Recommended Posts

Machine learning template for handwritten digit data
Data set for machine learning
Machine learning / data preprocessing
Japanese preprocessing for machine learning
xgboost: A valid machine learning model for table data
Machine learning stacking template (regression)
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Made icrawler easier to use for machine learning data collection
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
Machine learning using gene expression data
<For beginners> python library <For machine learning>
Preprocessing template for data analysis (Python)
Machine learning in Delemas (data acquisition)
Machine learning meeting information for HRTech
Preprocessing in machine learning 2 Data acquisition
[Recommended tagging for machine learning # 4] Machine learning script ...?
Preprocessing in machine learning 4 Data conversion
Basic machine learning procedure: ② Prepare data
How to collect machine learning data
[Machine learning] Check the performance of the classifier with handwritten character data
Amplify images for machine learning with python
First Steps for Machine Learning (AI) Beginners
Machine learning imbalanced data sklearn with k-NN
An introduction to OpenCV for machine learning
Why Python is chosen for machine learning
"Usable" one-hot Encoding method for machine learning
Python: Preprocessing in machine learning: Data acquisition
[Shakyo] Encounter with Python for machine learning
[Python] First data analysis / machine learning (Kaggle)
[Python] Web application design for machine learning
An introduction to Python for machine learning
Python: Preprocessing in machine learning: Data conversion
About data expansion processing for deep learning
Preprocessing in machine learning 1 Data analysis process
Creating a development environment for machine learning
[Updated Ver1.3.1] I made a data preprocessing library DataLiner for machine learning.
One-click data prediction for the field realized by fully automatic machine learning
An introduction to machine learning for bot developers
Data supply tricks using deques in machine learning
Recommended study order for machine learning / deep learning beginners
Machine learning starting from 0 for theoretical physics students # 1
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
[Python] Collect images with Icrawler for machine learning [1000 images]
Machine learning Training data division and learning / prediction / verification
I tried to process and transform the image and expand the data for machine learning
Machine learning starting from 0 for theoretical physics students # 2
[Memo] Machine learning
Collect images for machine learning (Bing Search API)
Machine learning classification
I started machine learning with Python Data preprocessing
Align the number of samples between classes of data for machine learning with Python
A story about data analysis by machine learning
[For beginners] Introduction to vectorization in machine learning
Machine Learning sample
[Python] Save PDF from Google Colaboratory to Google Drive! -Let's collect data for machine learning-
[Translation] scikit-learn 0.18 tutorial Statistical learning tutorial for scientific data processing
Image collection Python script for creating datasets for machine learning
Build an interactive environment for machine learning in Python
[Recommended tagging for machine learning # 2] Extension of scraping script
[Recommended tagging for machine learning # 2.5] Modification of scraping script
About data preprocessing of systems that use machine learning