[PYTHON] [Machine learning] Check the performance of the classifier with handwritten character data

Hello. This is Hayashi @ Ienter.

In the previous Blog, the color reduction process of the image was performed using scikit-learn's k-means algorithm and OpenCV.

This time, using the handwritten character data sample prepared in scikit-learn. Let's do a quick performance check of multiple classifiers.

Reading handwritten character data

A sample of handwritten character data is prepared in scikit-learn datasets, so load it. shot1.png

The explanatory variable X is an array of image data from 0 to 9, and the objective variable Y is an array of numbers from 0 to 9 corresponding to each image.

The first data of the X data is such an array of 64 numbers. shot2.png

Actually, this array is 8x8 size image data as an image, so Let's process the array and display the first 20 data. The image is displayed as a grayscale pixel image.

shot3.png

K-fold cross-validation

This time, we will evaluate the accuracy of the classifier using "K-fold cross-validation". "K-fold cross-validation" divides a sample group into K blocks and evaluates K-1 blocks as training data and the remaining 1 block as test data. Also, the test block will be evaluated while switching from the 1st to the Kth. The image is as follows. shot4.png scikit-learn provides K-fold for cross_validation. This time, we will prepare a K-fold that divides the sample data into 10 parts. shot5.png

Classifier to evaluate

Check the performance of the following classifiers. LogisticRegression ([Logistic Regression](https://ja.wikipedia.org/wiki/Logistic Regression)) GaussianNB ([Naive Bayes](https://ja.wikipedia.org/wiki/naive Bayes classifier)) SVC ([Support Vector Machine](https://ja.wikipedia.org/wiki/Support Vector Machine) ))) DecisionTreeClassifier ([Decision Tree](https://ja.wikipedia.org/wiki/Decision Tree)) RandomForestClassifier ([Random Forest](https://ja.wikipedia.org/wiki/Random Forest)) AdaBoostClassifierAdaBoostKNeighborsClassifier ([K-nearest neighbor method](https://ja.wikipedia.org/wiki/K-nearest neighbor method) )))

For SVC, check the kernel types with three types: "rbf (Gaussian kernel)", "linear (linear kernel)", and "poly (polynomial kernel)".

Prepare an array whose elements are the classifier instance and name as shown below. shot6.png

About performance check

Performance checks are evaluated based on the accuracy and analysis speed of each classifier. For accuracy, 10 prediction tests in K-fold are scored and averaged with accuracy_score in sklearn.metrics. In addition, the analysis speed measures the time required from learning (fit) to prediction (predict). Take the average. shot7.png

The following result was output. shot8.png

The three types of kernels, "SVC" (support vector machine) and "K Neighbors Classifier" (K-nearest neighbor method), give good numerical values.

The highest accuracy is SVC-rbf, but the analysis time seems to take some time. KNeighborsClassifier is the second numerical value in accuracy, but the analysis speed is 4 times faster than SVC-rbf.

Comprehensively assessing accuracy and speed, the K-NeighborsClassifier classifier is probably the best performing classifier in this test.

That's all for this story!

Recommended Posts

[Machine learning] Check the performance of the classifier with handwritten character data
Predict the gender of Twitter users with machine learning
Summary of the basic flow of machine learning with Python
Record of the first machine learning challenge with Keras
Align the number of samples between classes of data for machine learning with Python
Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
Try to evaluate the performance of machine learning / regression model
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Try to evaluate the performance of machine learning / classification model
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
A story stuck with the installation of the machine learning library JAX
Machine learning imbalanced data sklearn with k-NN
Check the existence of the file with python
Feature engineering for machine learning starting with the 1st Google Colaboratory --Binarization and discretization of count data
The story of doing deep learning with TPU
See the behavior of drunkenness with reinforcement learning
Check the date of the flag duty with Python
About the development contents of machine learning (Example)
I started machine learning with Python Data preprocessing
Convert the character code of the file with Python3
Check the status of your data using pandas_profiling
Let's check the population transition of Matsue City, Shimane Prefecture with open data
Key points of "Machine learning with Azure ML Studio"
The first step of machine learning ~ For those who want to implement with python ~
About data preprocessing of systems that use machine learning
Impressions of taking the Udacity Machine Learning Engineer Nano-degree
About testing in the implementation of machine learning models
I measured the performance of 1 million documents with mongoDB
Extract the band information of raster data with python
Calculate the product of matrices with a character expression?
I made an API with Docker that returns the predicted value of the machine learning model
Try scraping the data of COVID-19 in Tokyo with Python
A network diagram was created with the data of COVID-19.
Introduction to Machine Learning with scikit-learn-From data acquisition to parameter optimization
The result of Java engineers learning machine learning in Python www
Survey on the use of machine learning in real services
Predict the presence or absence of infidelity by machine learning
Check the memory status of the server with the Linux free command
Check the scope of local variables with the Python locals function.
Check the operating status of the server with the Linux top command
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
How to increase the number of machine learning dataset images
[Machine learning] I tried to summarize the theory of Adaboost
Try to extract the features of the sensor data with CNN
Basics of Machine Learning (Notes)
Machine learning with Python! Preparation
Machine learning Minesweeper with PyTorch
Check the code with flake8
Importance of machine learning datasets
Beginning with Python machine learning
Try machine learning with Kaggle
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
For those of you who glance at the log while learning with machine learning ~ Muscle training with LightGBM ~
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
Save the results of crawling with Scrapy to the Google Data Store
About learning method with original data of CenterNet (Objects as Points)
Check the memory protection of linux kerne with the code for ARM
How to use machine learning for work? 01_ Understand the purpose of machine learning
[Introduction to StyleGAN] Unique learning of anime with your own machine ♬
I want to check the position of my face with OpenCV!