Last time (1/3): https://qiita.com/tfull_tf/items/6015bee4af7d48176736 Next time (3/3): https://qiita.com/tfull_tf/items/d9fe3ab6c1e47d1b2e1e
Code: https://github.com/tfull/character_recognition
In creating the kana recognition system, we first built a model using CNN and confirmed how accurate it would be with MNIST. Next, prepare the image data of Kana, build the model in the same way, and improve the model.
I can't think of any image data for Kana, so I will create it automatically. For the time being, it seems that some datasets are open to the public.
Use ImageMagick for automatic generation. Since you can add characters to the image with the convert command, create a black image first, and then write only one white character on it.
In order to prepare multiple images for one character, we have prepared a method to increase the data.
If you write in different fonts, you can generate images with the same characters but different font types.
You can see the font with the following command, so pick up the one that seems to be usable.
convert -list font
One thing to keep in mind is that not all of them support Japanese, so even if you try to output kana, nothing may be written out.
Mac OS 10.15, which I was working on in the main, didn't have a font that looked good, so I generated the image on Ubuntu. The following fonts were included from the beginning, so I decided to use them.
font_list = [
"Noto-Sans-CJK-JP-Thin",
"Noto-Sans-CJK-JP-Medium",
"Noto-Serif-CJK-JP"
]
You can generate different images by writing full text on the screen or writing a little conservatively. This time, I wrote letters while gradually increasing the size from about half the size to the size just below.
If you write small characters, you can create spaces on the top, bottom, left, and right, so you can use the technique of shifting the characters vertically and horizontally. For example, if you think about shifting the blank / 2 and the blank / 3 up / down / left / right, you can generate 5 x 5 different images.
You can rotate the characters with convert. You can increase the number of images by rotating it slightly clockwise or counterclockwise.
You can increase the number of images by preparing a blurred image, but what about an image in which half of the image is out of focus? I didn't do it because I thought. A sufficient number of images can be secured with 1 to 4.
By adding noise such as small dots to the image, it is possible that the image will not only increase but also become more resistant to noise. I didn't do it because I couldn't find an easy way to add nice noise, but it may be a good future task.
Generates an image with characters by combining 1 to 4 (multiplication). Created at 256px in height and width, more than 4000 images were obtained for each character. You can change the number of sheets by playing with the various parameters used in the method. There are 169 types of hiragana (0x3041 ~ 0x3093) and katakana (0x30A1 ~ 0x30F6), so the capacity is quite large.
data_directory = "/path/to/data"
image_size = 256
#Creating a black image
def make_template():
res = subprocess.call([
"convert",
"-size", "{s}x{s}".format(s = image_size),
"xc:black",
"{}/tmp.png ".format(data_directory)
])
#Create an image of white text
def generate(path, font, pointsize, character, rotation, dx, dy):
res = subprocess.call([
"convert",
"-gravity", "Center",
"-font", font,
"-pointsize", str(pointsize),
"-fill", "White",
"-annotate", format_t(rotation, dx, dy), character,
"{}/tmp.png ".format(data_directory), path
])
#Move format function
def format_t(rotation, x, y):
xstr = "+" + str(x) if x >= 0 else str(x)
ystr = "+" + str(y) if y >= 0 else str(y)
return "{r}x{r}{x}{y}".format(r = rotation, x = xstr, y = ystr)
Create a black image only once for the first time, and create a white character image while changing the parameters of font, pointsize, character, rotation, dx, dy in a loop.
Now that we have the image, we will build the model in the same way as MNIST, but it didn't work from the beginning. The value of the Cross Entropy error is the same for each batch, and when observing the value in the layer when training as a debug, the absolute value contains a large value such as hundreds or thousands, and the output Was always the same. That's why I was able to insert Batch Normalization to greatly improve accuracy.
import torch.nn as nn
class Model(nn.Module):
def __init__(self, image_size, output):
super(Model, self).__init__()
n = ((image_size - 4) // 2 - 4) // 2
self.conv1 = nn.Conv2d(1, 4, 5)
self.relu1 = nn.ReLU()
self.normal1 = nn.BatchNorm2d(4)
self.pool1 = nn.MaxPool2d(2, 2)
self.dropout1 = nn.Dropout2d(0.3)
self.conv2 = nn.Conv2d(4, 16, 5)
self.relu2 = nn.ReLU()
self.normal2 = nn.BatchNorm2d(16)
self.pool2 = nn.MaxPool2d(2, 2)
self.dropout2 = nn.Dropout2d(0.3)
self.flatten = nn.Flatten()
self.linear1 = nn.Linear(n * n * 16, 1024)
self.relu3 = nn.ReLU()
self.normal3 = nn.BatchNorm1d(1024)
self.dropout3 = nn.Dropout(0.3)
self.linear2 = nn.Linear(1024, 256)
self.relu4 = nn.ReLU()
self.normal4 = nn.BatchNorm1d(256)
self.dropout4 = nn.Dropout(0.3)
self.linear3 = nn.Linear(256, output)
self.softmax = nn.Softmax(dim = 1)
def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.normal1(x)
x = self.pool1(x)
x = self.dropout1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.normal2(x)
x = self.pool2(x)
x = self.dropout2(x)
x = self.flatten(x)
x = self.linear1(x)
x = self.relu3(x)
x = self.normal3(x)
x = self.dropout3(x)
x = self.linear2(x)
x = self.relu4(x)
x = self.normal4(x)
x = self.dropout4(x)
x = self.linear3(x)
x = self.softmax(x)
return x
Basically, you will learn in the same procedure as you did in MNIST. I used Cross Entropy Loss, Adam (learning rate = 0.001).
Since the image was generated while changing the parameters in a loop, avoid it because the data seems to be biased if trained in order. Also, since I want to learn each character evenly, I would like to learn them in order.
If you train while reading images in a loop, you will learn one image in one batch. However, there is a lot of image data, so if you read them all at once, you may run out of memory. To avoid both, I decided to use yield to read the data by chunk.
# a1,Get the double loop of a2 by the number of chunks
def double_range(a1, a2, chunk = 100):
records = []
for x1 in a1:
for x2 in a2:
records.append((x1, x2))
if len(records) >= chunk:
yield records
records = []
if len(records) > 0:
yield records
A function that gives two arrays and returns the pairs obtained in a double loop by the number of chunks. Give this to for further.
Pseudo code
for indices in double_range("1~Shuffled image numbers up to N", "Number assigned to letters(0~168)"):
inputs = []
for i_character, i_image in indices:
inputs.append("i_character i of the second character_image Load the first image")
model.train(inputs) #Learning
With this, memory usage can be reduced by performing a loop that reads and trains images for the batch size.
4236 [sheets / characters] ✕ 169 [characters] I started the experiment after creating the image data. Using 5% of the total as test data, we trained with 2 epochs and measured the correct answer rate of the test data. It was about 71.4%. At first, I made a mistake in the program and chose 4236 instead of 169, but at that time it was a mystery that about 80% was out. I want to improve the performance a little more, but it seems that I can make a recognition system and run it for the time being.
Recommended Posts