[PYTHON] Record your Ring Fit Adventure with OCR

This article is the sixth day of the Aratana Advent Calendar. 68747470733a2f2f71696974612d696d6167652d73746f72652e73332e61702d6e6f727468656173742d312e616d617a6f6e6177732e636f6d2f302f3135303435352f65373962613531662d303136332d323436622d616437622d3939386638663434326437322e706e67.png

Do you guys have a Ring Fit Adventure? I want to mess up, but it goes down in less than 30 minutes. At the end of the play, there will be a record of how much exercise I was able to do this time, but I will take a picture with my smartphone there. (A person who is a beginner of Switch and did not know that screen capture can be done) ↓ This is the one rfa.jpg

I may not know it because I am a beginner of Switch, but it seems that I could not export the record. I wanted a number of how much I could make an effort, so this time I will try to record it using OCR that transcribes from the photo.

Preparation

brew install tesseract
pip install pyocr

Practice

Use pyocr to wrap the open source OCR tool Tesseract. For the time being, load the image and try OCR.

from PIL import Image

import pyocr
import pyocr.builders


image = Image.open("image_path")

tools = pyocr.get_available_tools()

tool = self.tools[0]

result_ocr = self.tool.image_to_string(
    image,
    lang="jpn",
    builder=pyocr.builders.TextBuilder(tesseract_layout=6))

print(result_ocr.splitlines())
Today's luck total results

Link Consqueeze_

Squat
-Bampaign push
Tony Touch Chess

-Chair pose

Tummy push

_ /② Team ② Times

⑥ 0 times(eom
③ ⑧ times(③⑧4)
③ ① times G 4
② ② times(② ② cause)

⑤ times ⑤ four

-② Cause ② Times

Dashing

Jinging

Peach Age
walking

Ring control down push key

m ⑤ Par

Shadow the painting surface

⑥⑤⑦m(⑥⑤zm)

-M(④⑤⑧m

⑨⑨m(gom)
_②m(②m)
⑯ Autumn ⑯ Gong

-⑤(⑥ Gong

Gorge 0 Rope




It's terrible! It's terrible, but for the time being, OCR itself is a success, so ...

I'm worried about the space between characters, next time I will manage this space. It seems that tesseract has an option to fill in the blanks. -c preserve_interword_spaces=1

I found a description that specifies the option from the source of pyocr, so I will use it.

builder = pyocr.builders.TextBuilder(tesseract_layout=6)
builder.tesseract_configs.append('-c')
builder.tesseract_configs.append('preserve_interword_spaces=1')
Today's exercise total result

Linkcon crush_

Squat
-Bapanzai Push
Tony to chest

-Chair pose

Push your stomach

_ /② group ② times

⑥ 0 times(eom
③ ⑧ times(③⑧4)
③ ① times G4
②② times(② ② cause)

⑤ times ⑤ four

-② Cause ② times

Dash

jogging

Raise the peach
walking

Ring control push down keep

m ⑤ par

圓 Screen

⑥⑤⑦m(⑥⑤zm)

-M(④⑤⑧m

⑨⑨m(gom)
_②m(②m)
⑯Autumn ⑯Gong

--- ⑤(⑥ Gong

Gorge 0 Rope binding

 

There are no spaces between characters.

I am concerned about the accuracy of OCR itself. Let's add preprocessing for the time being. By the way, I changed from splitline () to split () because there was no extra space.

im_blur = cv2.GaussianBlur(image, (5, 5), 0)
_, image = cv2.threshold(im_blur, 0, 255, cv2.THRESH_OTSU)

Noise removal and Otsu binarization processing were added with a Gaussian filter.

The first is before split () and the second is after split ().


Today's total result comfort`Digging the screen
.. Pushing the ring controller ⑦② times(⑦ ② 4)・ Dash ⑥ ⑤ ⑦ m(⑥⑤⑦m)
“Squat ⑥ 0 times(⑥ol)・ Jogging ④③⑧m(④③⑧m)
`Bansazai Push ③⑧ times(③ ⑧ 吏)   *Peach Akage ⑨⑨m(⑨⑨m)
Nee to Chest ③ ① times(③ ① i group)・ Walking ②m(②m)
・ Chair pose ②② times(② ② 4)・ Ringcon push down keep ⑯Autumn⑯)
Pushing in the stomach ⑤ times(⑤ Four).. Ringcon pulling keep ⑥ Autumn(⑥ You)
・ Push down the ring controller ② times(② times)
_The value in parentheses is the cumulative value from the start of play [`.. < close
‥*Leh Talk Shisho Kopu-Mourning Talk e

['Today's total result', 'Comfort', 'Yasu', '`', 'Digging the screen', '。', 'Ring control push', '⑦② times(⑦ ② 4)',
'・', 'dash', '⑥⑤⑦m(⑥⑤⑦m)', '“', 'Squat', '⑥ 0 times(⑥ol)', '・', 'jogging', '④③⑧m(④③⑧m)',
'`', 'Bansazai Push', '③ ⑧ times(③ ⑧ 吏)', '*', 'Peach Akege', '⑨⑨m(⑨⑨m)', 'Ya', 'Knee to chest'
, '③ ① times(③ ① i group)', '・', 'walking', '②m(②m)', '・', 'Chair pose', '②② times(② ② 4)', '・', 'ring
Keep pushing down the computer', '⑯ Autumn ⑯ role)', 'Shu', 'Tummy push', '⑤ times(⑤ Four)', '。', 'Ringcon pulling key
Pub', '⑥ Autumn(⑥ You)', '・', 'Push down the ring controller', '② times(② times)', '_', 'The numbers in parentheses are the accumulation from the start of play.
It is the total price', '〔`。〈', 'close', '‥*', 'Leh talk Shisho Kopu', '-', '-', '・ Mourning talk e']

You can now see the type and number of exercises by looking at the putt. It seems that 1 is like ① in the specifications of tesseract, so I thought I would write the mapping process in the program, but there was a function that could be used, so I tried using it.

unicodedata.normalize()

After that, take out the exercise and the number of times.

target_list = [
            "Squat", "Push the ring controller", "dash", "jogging", "Banzai Push", "Raise the peach",
            "Knee to chest", "walking", "Chair pose", "Ring control push down keep", "Tummy push",
            "Ringcon pull keep", "Push down the ring controller"
        ]

result = []
for target in target_list:
    s = []
    for j, line in enumerate(texts):
        s.append(
            difflib.SequenceMatcher(a=target, b=line).ratio())

    result.append(texts[s.index(max(s)) + 1].split('(')[0])


Here is what I added every time I executed it at the end.


{
    "0": {
        "Squat": "60 times",
        "Push the ring controller": "72 times",
        "dash": "657m",
        "jogging": "438m",
        "Banzai Push": "38 times",
        "Raise the peach": "99m",
        "Knee to chest": "31 times",
        "walking": "2m",
        "Chair pose": "22 times",
        "Ring control push down keep": "16 Autumn 16 roles)",
        "Tummy push": "5 times",
        "Ringcon pull keep": "6 autumn",
        "Push down the ring controller": "Twice"
    },
    "1": {
        "Squat": "60 times",
        "Push the ring controller": "72 times",
        "dash": "657m",
        "jogging": "438m",
        "Banzai Push": "38 times",
        "Raise the peach": "99m",
        "Knee to chest": "31 times",
        "walking": "2m",
        "Chair pose": "22 times",
        "Ring control push down keep": "16 Autumn 16 roles)",
        "Tummy push": "5 times",
        "Ringcon pull keep": "6 autumn",
        "Push down the ring controller": "Twice"
    }
}

Since it was executed with the same image, it was recorded the same number of times both times, but I was able to add it. " Ringcon push down keep ":" 16 autumn 16 roles) ", There are some parts that I couldn't get cleanly or I couldn't recognize the "times", but I'm happy with it.

Summary

This time, the ocr part was completely on the pyocr (tesseract), so I didn't think about how to improve the above "time" as "autumn", but next time I will improve the recognition accuracy. I want to try it.

But what should I do when I try ocr for the first time and there is a strange space ...? Accuracy will not improve without pretreatment ...? It was fun and learning to think about how to deal with each time I hit a wall.

I would like to add a few more functions such as csv output and total value output.

Recommended Posts

Record your Ring Fit Adventure with OCR
The story of wanting to buy Ring Fit Adventure
[Python] Try to create ring fit data using Amazon Textract [OCR] (Try code review with Code Guru)
Behind the Ring Fit Adventure arrival bot to eradicate resellers