Two times before (1/3): https://qiita.com/tfull_tf/items/6015bee4af7d48176736 Last time (2/3): https://qiita.com/tfull_tf/items/968bdb8f24f80d57617e
Whole code: https://github.com/tfull/character_recognition
Since the kana recognition model has reached the level where the operation of the system can be confirmed, we will generate an image with the GUI, load it into the model, and output characters.
I will use Tkinter, so install it.
# Mac +For Homebrew
$ brew install tcl-tk
#For Ubuntu Linux
$ sudo apt install python-tk
Create a canvas and left-click to draw a white line and right-click to draw a black line (a black background, so it's a real eraser). Two buttons are prepared, one for recognizing the written characters and the other for erasing (filling in black) the written ones.
import tkinter
# from PIL import Image, ImageDraw
class Board:
def __init__(self):
self.image_size = 256
self.window = tkinter.Tk()
self.window.title("Kana input")
self.frame = tkinter.Frame(self.window, width = self.image_size + 2, height = self.image_size + 40)
self.frame.pack()
self.canvas = tkinter.Canvas(self.frame, bg = "black", width = self.image_size, height = self.image_size)
self.canvas.place(x = 0, y = 0)
self.canvas.bind("<ButtonPress-1>", self.click_left)
self.canvas.bind("<B1-Motion>", self.drag_left)
self.canvas.bind("<ButtonPress-3>", self.click_right)
self.canvas.bind("<B3-Motion>", self.drag_right)
self.button_detect = tkinter.Button(self.frame, bg = "blue", fg = "white", text = "recognition", width = 100, height = 40, command = self.press_detect)
self.button_detect.place(x = 0, y = self.image_size)
self.button_delete = tkinter.Button(self.frame, bg = "green", fg = "white", text = "Delete", width = 100, height = 40, command = self.press_delete)
self.button_delete.place(x = self.image_size // 2, y = self.image_size)
# self.image = Image.new("L", (self.image_size, self.image_size))
# self.draw = ImageDraw.Draw(self.image)
def press_detect(self):
output = recognize(np.array(self.image).reshape(1, 1, self.image_size, self.image_size)) #recognize is a function that recognizes using machine learning
sys.stdout.write(output)
sys.stdout.flush()
def press_delete(self):
# self.canvas.delete("all")
# self.draw.rectangle((0, 0, self.image_size, self.image_size), fill = 0)
def click_left(self, event):
ex = event.x
ey = event.y
self.canvas.create_oval(
ex, ey, ex, ey,
outline = "white",
width = 8
)
# self.draw.ellipse((ex - 4, ey - 4, ex + 4, ey + 4), fill = 255)
self.x = ex
self.y = ey
def drag_left(self, event):
ex = event.x
ey = event.y
self.canvas.create_line(
self.x, self.y, ex, ey,
fill = "white",
width = 8
)
# self.draw.line((self.x, self.y, ex, ey), fill = 255, width = 8)
self.x = ex
self.y = ey
def click_right(self, event):
ex = event.x
ey = event.y
self.canvas.create_oval(
ex, ey, ex, ey,
outline = "black",
width = 8
)
# self.draw.ellipse((ex - 4, ey - 4, ex + 4, ey + 4), fill = 0)
self.x = event.x
self.y = event.y
def drag_right(self, event):
ex = event.x
ey = event.y
self.canvas.create_line(
self.x, self.y, ex, ey,
fill = "black",
width = 8
)
# self.draw.line((self.x, self.y, ex, ey), fill = 0, width = 8)
self.x = event.x
self.y = event.y
With this, you can write and erase characters.
After some research, Tkinter's Canvas can draw lines, so it can be written, but it seems that it is not possible to read what you wrote and quantify it.
As I was looking for a solution, I found the argument that I should have the Pillow internally and draw the same on the Pillow's Image when drawing on the Canvas. (I couldn't think of that, so I thought it was a very smart idea.) The commented out part of the code above corresponds to that. Pillow's Image can be quickly turned into an array of numpy with numpy.array, making it easy to give to your machine learning model.
We have completed a GUI for entering handwritten characters and a machine learning model for classifying kana images. The combination of these is next.
YouTube: https://www.youtube.com/watch?v=a0MBfVVp7mA
It's not very accurate, but it recognized it fairly correctly. It doesn't recognize dirty characters, but when I typed "aiueo" carefully, it output the corresponding characters properly.
I tried the "Hello", if "blood" is recognized in the "La" and "filtration" was frequent. When I checked the image data of "Chi" because I was curious, I found that there was a slight difference between the shape of "Chi" I wrote and that of the image. Since I learned only 3 types of fonts, it seems that I can't handle the input with strange shapes. When I wrote the characters in a form close to the input data, they recognized it correctly.
The recognition time outputs the result in an instant without using the GPU for the machine learning model.
It would be ideal if we could recognize kanji, but the classification destinations would be 169 to thousands. Kana alone is not accurate enough, so I feel that performance is likely to drop at once.
It will take some time, but I think that the convenience as an interface will increase if you output the top n items with a high probability of classification.
After that, if you learn the certainty as a word and recognize it by the shape of the character + the likelihood as a word, the accuracy when entering a meaningful word or sentence may increase. (For example, when you enter "hello", add that "ha" is about to come next.)
Recommended Posts