I will confirm the subject matter because it is a good study material that I was a little worried about.
I heard on Twitter Japanese "only" unreadable fonts !? I thought it was stupid, but I couldn't read it because it seemed to be really readable! (Http://togetter.com/li/887973)
** "Does the readable and unreadable (= indistinguishable) change depending on the presence or absence of the recognition function for katakana (or a shape close to it)?" **
That's why I decided to check it using a convolutional neural network. Someone may have already tried it, but it's one thing because it's studying.
The explanation of the convolutional neural network is omitted here. I don't know.
A classifier that can only classify English is more accurate than a classifier that can classify katakana + English.
By comparing the following 5 patterns, we will evaluate the effect of the difference of known characters on the accuracy.
No. | Training data | Number of output classes |
---|---|---|
1 | Alphabet + numbers only | 62 (case sensitive) |
2 | No.1 + Hiragana only | 62+48 = 110 |
3 | No.1 + Katakana only | 62+48 = 110 |
4 | No.1 + Hiragana + Katakana | 62+48*2 = 158 |
5 | No.1 + Hebrew letters | 62+28 = 89 |
For No.5, see http://dic.nicovideo.jp/a/%E3%83%98%E3%83%96%E3%83%A9%E3%82%A4%E6%96%87%E5% The "letter" and "end form" described in AD% 97 are treated as separate classes. (In the first place, for Tsukkomi who said "Hebrew letters", the reason is that I added it for comparison because there was a comment in the togetter "I feel like it is a bit like Hebrew letters.")
I thought I'd use free data for OCR, but I didn't find the right one, so I made it myself.
With the script described in Generate a lot of single-character images with Pillow (PIL) (http://qiita.com/lazykyama/items/65bcce351f3d1cf07d8e) I made an image of only one character for the number of target characters, and added random noise to it to inflate the data. In this verification, 30 images with noise per character are dynamically generated and used as learning data.
Also, the test image used was manually cut out from the sample image at http://www.dafont.com/electroharmonix.font.
Both training data and test data use 64x64 grayscale images.
The configuration of the classifier is as shown in the code below. Various parameters are appropriate.
self.__model = chainer.FunctionSet(
conv1=F.Convolution2D(1, 16, 5),
conv2=F.Convolution2D(16, 16, 5),
l3=F.Linear(784, 784),
softmax4=F.Linear(784, class_num))
#Omission
x = chainer.Variable(source)
h = F.max_pooling_2d(F.relu(self.__model.conv1(x)), ksize=2, stride=2)
h = F.max_pooling_2d(F.relu(self.__model.conv2(h)), ksize=3, stride=4)
h = F.dropout(F.relu(self.__model.l3(h)), train=train)
y = self.__model.softmax4(h)
As you can see, we are using Chainer for this implementation. (Thank you for referring to http://qiita.com/hogefugabar/items/312707a09d29632e7288.)
This experiment uses the following parameters for all patterns. (I'm trying various other things)
The results under these conditions are as follows.
No. | Training data | Correct answer rate |
---|---|---|
1 | Alphabet + numbers only | 46.2% (12 / 26) |
2 | No.1 + Hiragana only | 34.6% ( 9 / 26) |
3 | No.1 + Katakana only | 15.4% ( 4 / 26) |
4 | No.1 + Hiragana + Katakana | 26.9% ( 7 / 26) |
5 | No.1 + Hebrew letters | 23.1% ( 6 / 26) |
In the first place, the percentage of correct answers is not very high, but even so, the accuracy when mixed with katakana is terrible. .. .. After all it may be difficult for Japanese people to read.
Finally, the log at the time of the test is described below. (In the log, "capsV" etc. indicates that it is in uppercase. The electroharmonix targeted this time seems to have the same shape in both uppercase and lowercase, so it is treated as the correct answer regardless of which one matches. )
test models.
[PATTERN1]: English only.
2015-10-19 08:08:39,112 [INFO] #data: 26
2015-10-19 08:08:39,186 [INFO] correct: v, answer: capsV => RIGHT
2015-10-19 08:08:39,186 [INFO] correct: g, answer: capsT => WRONG
2015-10-19 08:08:39,186 [INFO] correct: s, answer: capsE => WRONG
2015-10-19 08:08:39,187 [INFO] correct: o, answer: capsD => WRONG
2015-10-19 08:08:39,187 [INFO] correct: i, answer: capsZ => WRONG
2015-10-19 08:08:39,187 [INFO] correct: r, answer: capsR => RIGHT
2015-10-19 08:08:39,187 [INFO] correct: u, answer: capsU => RIGHT
2015-10-19 08:08:39,187 [INFO] correct: a, answer: capsA => RIGHT
2015-10-19 08:08:39,187 [INFO] correct: h, answer: capsH => RIGHT
2015-10-19 08:08:39,188 [INFO] correct: l, answer: capsD => WRONG
2015-10-19 08:08:39,188 [INFO] correct: k, answer: capsN => WRONG
2015-10-19 08:08:39,188 [INFO] correct: n, answer: capsO => WRONG
2015-10-19 08:08:39,188 [INFO] correct: q, answer: capsT => WRONG
2015-10-19 08:08:39,188 [INFO] correct: d, answer: capsD => RIGHT
2015-10-19 08:08:39,188 [INFO] correct: m, answer: capsM => RIGHT
2015-10-19 08:08:39,189 [INFO] correct: t, answer: n => WRONG
2015-10-19 08:08:39,189 [INFO] correct: e, answer: capsE => RIGHT
2015-10-19 08:08:39,189 [INFO] correct: x, answer: x => RIGHT
2015-10-19 08:08:39,189 [INFO] correct: p, answer: capsF => WRONG
2015-10-19 08:08:39,189 [INFO] correct: w, answer: capsQ => WRONG
2015-10-19 08:08:39,189 [INFO] correct: z, answer: capsZ => RIGHT
2015-10-19 08:08:39,189 [INFO] correct: y, answer: capsY => RIGHT
2015-10-19 08:08:39,189 [INFO] correct: c, answer: capsT => WRONG
2015-10-19 08:08:39,190 [INFO] correct: b, answer: capsZ => WRONG
2015-10-19 08:08:39,190 [INFO] correct: f, answer: capsF => RIGHT
2015-10-19 08:08:39,190 [INFO] correct: j, answer: capsZ => WRONG
2015-10-19 08:08:39,190 [INFO] test accuracy: 0.461538461538 (12 / 26)
[PATTERN2]: English and hiragana.
2015-10-19 08:08:39,314 [INFO] #data: 26
2015-10-19 08:08:39,387 [INFO] correct: v, answer: capsV => RIGHT
2015-10-19 08:08:39,387 [INFO] correct: g, answer: capsT => WRONG
2015-10-19 08:08:39,387 [INFO] correct: s, answer: capsB => WRONG
2015-10-19 08:08:39,387 [INFO] correct: o, answer: capsB => WRONG
2015-10-19 08:08:39,388 [INFO] correct: i, answer: capsZ => WRONG
2015-10-19 08:08:39,388 [INFO] correct: r, answer:Ho=> WRONG
2015-10-19 08:08:39,388 [INFO] correct: u, answer: capsU => RIGHT
2015-10-19 08:08:39,388 [INFO] correct: a, answer: l => WRONG
2015-10-19 08:08:39,388 [INFO] correct: h, answer:Wow=> WRONG
2015-10-19 08:08:39,388 [INFO] correct: l, answer: capsU => WRONG
2015-10-19 08:08:39,388 [INFO] correct: k, answer: capsT => WRONG
2015-10-19 08:08:39,389 [INFO] correct: n, answer: capsD => WRONG
2015-10-19 08:08:39,389 [INFO] correct: q, answer: capsT => WRONG
2015-10-19 08:08:39,389 [INFO] correct: d, answer: capsD => RIGHT
2015-10-19 08:08:39,389 [INFO] correct: m, answer: capsM => RIGHT
2015-10-19 08:08:39,389 [INFO] correct: t, answer:Su=> WRONG
2015-10-19 08:08:39,389 [INFO] correct: e, answer: capsE => RIGHT
2015-10-19 08:08:39,390 [INFO] correct: x, answer: x => RIGHT
2015-10-19 08:08:39,390 [INFO] correct: p, answer: capsT => WRONG
2015-10-19 08:08:39,390 [INFO] correct: w, answer:I=> WRONG
2015-10-19 08:08:39,390 [INFO] correct: z, answer: capsZ => RIGHT
2015-10-19 08:08:39,390 [INFO] correct: y, answer: capsY => RIGHT
2015-10-19 08:08:39,390 [INFO] correct: c, answer: capsE => WRONG
2015-10-19 08:08:39,391 [INFO] correct: b, answer: capsT => WRONG
2015-10-19 08:08:39,392 [INFO] correct: f, answer: capsT => WRONG
2015-10-19 08:08:39,392 [INFO] correct: j, answer: capsJ => RIGHT
2015-10-19 08:08:39,392 [INFO] test accuracy: 0.346153846154 (9 / 26)
[PATTERN3]: English and katakana.
2015-10-19 08:08:39,517 [INFO] #data: 26
2015-10-19 08:08:39,591 [INFO] correct: v, answer: capsV => RIGHT
2015-10-19 08:08:39,591 [INFO] correct: g, answer: capsQ => WRONG
2015-10-19 08:08:39,591 [INFO] correct: s, answer:La=> WRONG
2015-10-19 08:08:39,591 [INFO] correct: o, answer:Wow=> WRONG
2015-10-19 08:08:39,591 [INFO] correct: i, answer:ヱ=> WRONG
2015-10-19 08:08:39,592 [INFO] correct: r, answer:Wow=> WRONG
2015-10-19 08:08:39,592 [INFO] correct: u, answer: capsU => RIGHT
2015-10-19 08:08:39,592 [INFO] correct: a, answer:Mu=> WRONG
2015-10-19 08:08:39,592 [INFO] correct: h, answer: r => WRONG
2015-10-19 08:08:39,592 [INFO] correct: l, answer: capsQ => WRONG
2015-10-19 08:08:39,592 [INFO] correct: k, answer:Wow=> WRONG
2015-10-19 08:08:39,592 [INFO] correct: n, answer:Wow=> WRONG
2015-10-19 08:08:39,593 [INFO] correct: q, answer: capsR => WRONG
2015-10-19 08:08:39,593 [INFO] correct: d, answer:Wow=> WRONG
2015-10-19 08:08:39,593 [INFO] correct: m, answer: capsO => WRONG
2015-10-19 08:08:39,593 [INFO] correct: t, answer: l => WRONG
2015-10-19 08:08:39,593 [INFO] correct: e, answer:La=> WRONG
2015-10-19 08:08:39,593 [INFO] correct: x, answer:Me=> WRONG
2015-10-19 08:08:39,593 [INFO] correct: p, answer:A=> WRONG
2015-10-19 08:08:39,593 [INFO] correct: w, answer: capsQ => WRONG
2015-10-19 08:08:39,593 [INFO] correct: z, answer: capsZ => RIGHT
2015-10-19 08:08:39,593 [INFO] correct: y, answer: capsY => RIGHT
2015-10-19 08:08:39,594 [INFO] correct: c, answer: capsE => WRONG
2015-10-19 08:08:39,594 [INFO] correct: b, answer:Mosquito=> WRONG
2015-10-19 08:08:39,594 [INFO] correct: f, answer: 5 => WRONG
2015-10-19 08:08:39,594 [INFO] correct: j, answer: capsT => WRONG
2015-10-19 08:08:39,594 [INFO] test accuracy: 0.153846153846 (4 / 26)
[PATTERN4]: English and hiragana and katakana.
2015-10-19 08:08:39,718 [INFO] #data: 26
2015-10-19 08:08:39,792 [INFO] correct: v, answer:No=> WRONG
2015-10-19 08:08:39,792 [INFO] correct: g, answer: capsQ => WRONG
2015-10-19 08:08:39,792 [INFO] correct: s, answer:La=> WRONG
2015-10-19 08:08:39,793 [INFO] correct: o, answer: capsQ => WRONG
2015-10-19 08:08:39,793 [INFO] correct: i, answer: capsT => WRONG
2015-10-19 08:08:39,793 [INFO] correct: r, answer:A=> WRONG
2015-10-19 08:08:39,793 [INFO] correct: u, answer: capsU => RIGHT
2015-10-19 08:08:39,793 [INFO] correct: a, answer:Mu=> WRONG
2015-10-19 08:08:39,793 [INFO] correct: h, answer:Wow=> WRONG
2015-10-19 08:08:39,793 [INFO] correct: l, answer:Ri=> WRONG
2015-10-19 08:08:39,794 [INFO] correct: k, answer:Wow=> WRONG
2015-10-19 08:08:39,794 [INFO] correct: n, answer:Wow=> WRONG
2015-10-19 08:08:39,794 [INFO] correct: q, answer: capsQ => RIGHT
2015-10-19 08:08:39,794 [INFO] correct: d, answer: capsD => RIGHT
2015-10-19 08:08:39,795 [INFO] correct: m, answer: capsM => RIGHT
2015-10-19 08:08:39,795 [INFO] correct: t, answer:Na=> WRONG
2015-10-19 08:08:39,795 [INFO] correct: e, answer: capsE => RIGHT
2015-10-19 08:08:39,795 [INFO] correct: x, answer:Me=> WRONG
2015-10-19 08:08:39,795 [INFO] correct: p, answer:A=> WRONG
2015-10-19 08:08:39,796 [INFO] correct: w, answer:B=> WRONG
2015-10-19 08:08:39,796 [INFO] correct: z, answer: capsZ => RIGHT
2015-10-19 08:08:39,796 [INFO] correct: y, answer:No=> WRONG
2015-10-19 08:08:39,796 [INFO] correct: c, answer: capsQ => WRONG
2015-10-19 08:08:39,796 [INFO] correct: b, answer: capsZ => WRONG
2015-10-19 08:08:39,796 [INFO] correct: f, answer:Te=> WRONG
2015-10-19 08:08:39,796 [INFO] correct: j, answer: capsJ => RIGHT
2015-10-19 08:08:39,796 [INFO] test accuracy: 0.269230769231 (7 / 26)
[PATTERN5]: English and Hebrew.
2015-10-19 08:08:39,921 [INFO] #data: 26
2015-10-19 08:08:39,994 [INFO] correct: v, answer: capsV => RIGHT
2015-10-19 08:08:39,995 [INFO] correct: g, answer: ם => WRONG
2015-10-19 08:08:39,995 [INFO] correct: s, answer: capsZ => WRONG
2015-10-19 08:08:39,995 [INFO] correct: o, answer: capsH => WRONG
2015-10-19 08:08:39,995 [INFO] correct: i, answer: capsZ => WRONG
2015-10-19 08:08:39,995 [INFO] correct: r, answer: capsK => WRONG
2015-10-19 08:08:39,995 [INFO] correct: u, answer: capsH => WRONG
2015-10-19 08:08:39,996 [INFO] correct: a, answer: capsA => RIGHT
2015-10-19 08:08:39,996 [INFO] correct: h, answer: b => WRONG
2015-10-19 08:08:39,996 [INFO] correct: l, answer: ם => WRONG
2015-10-19 08:08:39,996 [INFO] correct: k, answer: ל => WRONG
2015-10-19 08:08:39,996 [INFO] correct: n, answer: capsH => WRONG
2015-10-19 08:08:39,997 [INFO] correct: q, answer: capsH => WRONG
2015-10-19 08:08:39,997 [INFO] correct: d, answer: capsD => RIGHT
2015-10-19 08:08:39,997 [INFO] correct: m, answer: capsM => RIGHT
2015-10-19 08:08:39,997 [INFO] correct: t, answer: l => WRONG
2015-10-19 08:08:39,997 [INFO] correct: e, answer: z => WRONG
2015-10-19 08:08:39,997 [INFO] correct: x, answer: capsX => RIGHT
2015-10-19 08:08:39,997 [INFO] correct: p, answer: capsT => WRONG
2015-10-19 08:08:39,998 [INFO] correct: w, answer: capsQ => WRONG
2015-10-19 08:08:39,998 [INFO] correct: z, answer: capsZ => RIGHT
2015-10-19 08:08:39,998 [INFO] correct: y, answer: capsV => WRONG
2015-10-19 08:08:39,998 [INFO] correct: c, answer: capsQ => WRONG
2015-10-19 08:08:39,998 [INFO] correct: b, answer: capsM => WRONG
2015-10-19 08:08:39,998 [INFO] correct: f, answer: capsM => WRONG
2015-10-19 08:08:39,998 [INFO] correct: j, answer: 5 => WRONG
2015-10-19 08:08:39,999 [INFO] test accuracy: 0.230769230769 (6 / 26)
Even in the English-only model, "g" is mistaken for "T", so I can't say anything about it. ↑ "g" of electroharmonix
↑ "T" of learning data
However, if it is an English + Katakana model, it is useless because "s" is mistaken for "la" ... ↑ "s" of electroharmonix
↑ "La" of learning data
… Not really. This seems to be wrong.
I forgot to write some, so I will add it.
Leave the set of sauce. https://gist.github.com/lazykyama/f586419cd72d5312288e
Recommended Posts