[PYTHON] Released a web service for scoring handwriting using DeepLearning

As a hobby, I personally developed a web service that AI automatically scores the readability of characters when I handwrite them on a browser, and released it about two weeks ago: relaxed:

Letters-AI handwritten character scoring app

The core technologies are DeepLearning image recognition and front-end implementation with Preact. In this article, we will describe ** service outline, technical details, and impressions ** regarding the developed application.

Service overview

Function introduction

Since the number of pages is small and it is lightweight, it is probably faster to have a look at the app regarding the function, but I will explain the function introduction including an overview of the technology used with screenshots.

Handwritten character scoring from the top page

letters_demo_01.gif

――On the top page, a frame where you can write characters is displayed, and you can write characters. --Allows you to write characters in HTML5 Canvas. --Write letters and click the "scoring button" to display candidates. --Image written on Canvas to PNG file and send to server --Machine learning inference is executed for the received image on the server. --Return the inference result with a high probability of being written to the client ――When you select the written character, the scoring result (out of 100 points) will be displayed. --Based on the value of the probability of the inference result, calculate the score by applying a formula that gives an intuitive score. ――It is more likely that you will get a high score if you write polite and easy-to-read characters. ――You can change the thickness of the line you write (the scoring result may change greatly depending on the thickness of the line).

Display of scoring results of characters written so far

letters_demo_02_2.gif

--You can see a list of the characters you have written (graded characters). --The written character list is stored as a log on the server. --Get a list of characters written with a cookie as a key --If you use another terminal / browser or if you use a secret browser, the history of the written characters will not remain because the cookie changes. --The written characters themselves and the list of written characters are completely invisible to other users.

Display of high score examples

letters_demo_03.gif

――If you can't get a high score, you can display an example of what kind of characters you should write to get a high score. --Examples of high scores obtained by the inference processing actually executed on the server are extracted from the training data and test data, and the image is displayed together with the inferred score.

Background and purpose of development

The purpose of this development is twofold.

  1. It would be fun if there was an app that would score handwritten characters
  2. I wanted to create a web application that applies DeepLearning

As a result of developing it, I am satisfied because I satisfied both of them for the time being. I don't think there is a service that offers the same functions as the one we developed this time. (No, I haven't investigated other services so much ...)

Sticking points

Light movement

Basically, it is supposed to be used on smartphones and tablets, so I tried hard to make the initial display etc. move smoothly. Specifically, we are taking measures such as making the files on the front end as light as possible.

Guarantee of the number of types of scoring characters

The scoring target is a total of 3175 characters, including hiragana, katakana, kanji, romaji, and numbers, which are often used in Japanese, and we have made it almost cover the characters that are commonly used in Japan [^ char_types].

If it is only numbers, it can be easily prepared by using the MNIST data set, and there are various data sets with only alphanumeric characters, but alphanumeric characters are mixed with hiragana, katakana, kanji, and alphanumeric characters. It was a hassle to create a few datasets including upper and lower case.

[^ char_types]: The breakdown is 75 hiragana characters, 71 katakana characters, 2967 kanji characters, 52 romaji characters, and 10 numbers.

Technical details

Architecture overview

Letters_architecture (1).png

Technology / language / tools used

front end

Preact is a framework like a lightweight version of React. It's implemented to be insanely lightweight while still using React's main APIs almost as-is [^ preact_gzip]. I wasn't overwhelmed with React, so I didn't have any problems with Preact [^ preact_api].

[^ preact_gzip]: About 3KB with gzip. Astonishing. [^ preact_api]: The functional component hook API can be used normally in Preact, and there is also a router for Preact.

Back end

Since the machine learning part uses Python3 + Keras, Python is selected because of its affinity and convenience for the backend that needs to perform inference processing in real time. Flask is selected simply because it is light and easy to use, but since there are times when you want to perform asynchronous processing even on an application server on Python, go to FastAPI. I am also thinking about replacing it. There is no particular reason to choose Nginx, uWSGI, but basically application servers like Flask are not optimized for the behavior of the Web front end in production environments, so they are best suited for those uses and Flask. It's just that you're using something that has a high affinity.

Machine learning part

Both learning and reasoning use Keras, a deep learning framework.

Infrastructure such as execution environment

I use GCP in general (I don't use any cloud services other than GCP). The backend server uses Google Compute Engine, the storage location for image files etc. uses Google Cloud Storage, and the load balancer uses Google Cloud Load Balancing. I was worried about Google Cloud Load Balancing because the monthly fee is about 2700 yen (as of March 23, 2020), which is relatively expensive, but it manages the certificate automatically, and by any chance I felt reassured that I could increase the number of instances when the number of accesses increased, so I decided to use it once.

Deployment is executed in the following flow with the Docker container as the release unit.

  1. Build the front end in the development environment
  2. Create a container image with pre-built and executable code
  3. Push the created container image to your private container registry
  4. Create an instance template based on the pushed container image
  5. Update the instance group based on the instance template

There are many katakana ... w We are not thinking about updating the training data so far, so we are using a model that has already been trained.

Other tools and services that were taken care of

Inkscape It is a free software that provides almost the same functions as Adobe Illustrator. Used in creating logos and icons.

** Let's make a favicon.ico! ** This is very convenient when creating favicon batches for various sizes and formats.

EZGIF.COM - Animated GIF editor and GIF maker It is a convenient service to convert the demo video of the application to GIF when you want to upload it to Twitter on Qiita. I have tried several similar services, but I personally found this service to be the most convenient because of the flexibility of conversion and processing time.

Wikitionary In order to create a list of kanji, I wanted to classify them according to the grades I learned at school, so I collected the data. We have already acquired data such as readings, total number of strokes, and radicals, but currently we have not implemented a display classification / search function using them.

Data used to train machine learning models

ETL Character Database(ETL CDB) Handwritten characters (+ small amount of printed characters) This is a data set of image data. It is a data set consisting of about 3200 types of characters used in Japan, and there are 1,115,065 image data. Character types consist of hiragana, katakana, kanji, alphanumeric characters, and symbols, but do not include lowercase letters in the Latin alphabet (Roman alphabet). In addition, ETLCDB is a data set that is a little difficult to handle from the following points.

--Each dataset is binary data with a fixed length format, not modern JSON or XML. --The internally held character code is JIX X 0201 or CO-59 (character code for the Six-Company Agreement newspaper).

I have created and published a Python script that extracts images from all ETLCDB datasets (https://github.com/choo/etlcdb-image-extractor). Take the raw image data from the binary data on the dataset and save each code as a PNG as a directory so that you can treat it as a Unicode code point label. The dataset is free to use, but please note that it says Please contact us for conditions for commercial use.

The EMNIST Datset A data set of handwritten characters provided by NIST (National Institute of Standards and Technology). There are 814,255 images for all 62 characters, including Roman letters and numbers. Note that MNIST, which often appears in deep learning tutorials, is a subset of this dataset. The ETL CDB mentioned above contains many characters, but there is no data in lowercase Roman letters. However, I really wanted to include all alphanumeric characters as scoring targets, so I added a dataset containing all Roman characters as training data.

Source code etc.

The basic source is available at GitHub Repository. However, some files necessary for executing the application such as the trained model are not placed on GitHub, so the execution environment cannot be reproduced as it is as a development environment.

Impressions

What I felt while playing with the app

Now, suddenly it's an unrelated topic, but can you decipher the meaning of the next sentence? (* Slightly changed from the time of posting)

** "Uguchi Uriki Nirikiko Yu" **

mushimegane_boy.png

sashimi_maguro_ootoro.png kani_ashi.png

I'm sorry to put a useless image so that you can't see the correct answer right away: sweat:

And there isn't even a single letter for those who think that the image is a complete hook and that it is a katakana notation of "returned to melt" ... The correct answer is as follows, and in fact it is just a list of characters that do not hold at all as a sentence.

卜: Water ** 卜 ** Ana's ** "卜" ** Mouth: ** Mouth ** Stomatitis, ** Mouth ** Corner ** "Mouth" ** Power: ** Power **, ** Power ** ** "Power" ** Two: ** Binomial Kotamagawa, ** Binomial distribution ** "Binomial" ** Engineering: ** Engineering ** During the process, ** Engineering **, Saito ** Engineering **'s ** "Engineering" ** Evening: ** Evening **, ** Evening **, ** Evening ** Meals ** "Evening" ** (If you copy and do a Google search, you will find that they are all kanji.)

I think that there are probably not many people who can recognize all the characters correctly. If so, you can see that ** "if it is a proper type, any character can be recognized and read by humans" is not true ** [^ miss_read]. For example, when asked to read aloud to the indication "Kou", if there is no hint, the only answer is "either" Kou "of construction or" E "(e) of katakana." I wonder if it can be done [^ e_font]. In other words, ** it can be very difficult to distinguish a single character **.

[^ miss_read]: If it's not just a misreading ... [^ e_font]: Depending on the font, there may be fonts / people who can distinguish between kanji and katakana.

Also, if the question "Can you decipher the meaning of the next sentence?" At the beginning of this section is ** "What is written for each character in the next character string?" **, the view is also different. I think there are many people who are different.

As is often said, when studying or experimenting with machine learning, especially deep learning, we often feel strongly that ** human cognition is likely to be contextually dependent **. There is. This character recognition is a typical example, and when we read a character, we do not recognize and distinguish only the figure of the character alone, but "what kind of character is written around it" It relies heavily on "information that can be called context" such as "what kind of words and morphemes are elements" and "what kind of words are naturally written there" ** You can see that.

In the app developed this time, images are recognized and scored without using any contextual information, so there are quite a few characters that are difficult to get high scores. Typical examples are the above five characters, but other characters such as 0 (number zero) and O (Latin alphabet O) are difficult to recognize because they are difficult to recognize by themselves, so it is difficult to get a high score. There are a number of.

In machine learning-related development, we often work while thinking about "how humans recognize", but it is very interesting to delve into it purely, and I am surprised at how good the human brain is. Often there are. Also, I think there are many hints from that perspective, such as "what AI can and cannot do at the moment".

Personally, not only words but also contextual information has reached a level where it can be analyzed by natural language processing, and I think that multimodal deep learning will develop significantly in the next few years. I believe that AI will naturally be able to make highly accurate inferences after reading the context. Through these advances, I believe that so-called "smart AI" will be realized in the not too distant future.

Difficulty of release in personal development

In personal development, it is very difficult to decide "how much should be made and released", unlike the case where the delivery date is fixed by contract etc. like business. Whether it's a business or a hobby, if you've done a lot of software development, you'll know, but in the middle of development, in addition to the bugs that need to be fixed, there are improvements and features that you should have. Come out innumerably. Even in this personal development, there are still many features that have been added and points that should be improved ...

Under such circumstances, it is very difficult to decide "how far we can release it". I think the most important thing in such a situation is ** a strong will to "absolutely bring it to the world" **.

** "Done is better than perfect" , which seems to be used as a slogan within Facebook, ** "The software is never in perfect condition. That's why we put it out with a sense of speed and keep improving it little by little. It is important to keep going " It seems that the background is the intention [^ facebook]. As typified by this word, it is necessary to give up to some extent when releasing. However, on the other hand, it is often said that it takes a long time for a user who has left once to come back, and it is quite difficult to easily release low quality products to the world and pride as an engineer. It's difficult.

With that in mind, after all, even in personal development, it's best to set a ** release date (set a release date in advance so that everything is released there **). I'm wondering if there is one these days. (There may be other good ways ... If you know it, I really want to know so please let me know ...!)

I have more thoughts and thoughts about this "difficulty of release timing", so I would like to write a separate article as a poem.

[^ facebook]: The Hacker Way — Facebook has the original text. It's a very nice sentence.

Finally

In Tokyo, we are in an environment where it is difficult to get out of the house due to heavy snowfall due to the request to refrain from going out, and we hope that you can play even a little if you like. There are still many imperfections and it may not be functionally sufficient, but I hope you enjoy it!

Recently, I've started to do Twitter in detail, so please follow Twitter account if you like m (_ _) m

Recommended Posts

Released a web service for scoring handwriting using DeepLearning
Creating a web application using Flask ②
Creating a web application using Flask ①
Creating a web application using Flask ③
Creating a web application using Flask ④
Impressions of using Flask for a month
Create a web service with Docker + Flask
Create a web service in Flask-SQLAlchemy + PostgreSQL
2-2. Input for becoming a WEB engineer (Linux basics)
Create a web map using Python and GDAL
Tips for using ElasticSearch in a good way
Let's make a module for Python using SWIG
2-1. Input for becoming a WEB engineer (Linux basics)