[Python] Try to create ring fit data using Amazon Textract [OCR] (Try code review with Code Guru)

This article is a continuation of the previous one. Just around the time I wrote the previous article, I noticed that my company's Advent calendar is related to AI, and I would like to confirm while actually using the point that I wrote in summary, "If you use AWS Textract, it may be more accurate." I will.

Also, this time I would like to do a code review using the AWS service CodeGuru.

What is Amazon Textract?

It is a service that can be OCRed very easily. Currently it seems to be available only in the regions of Paris (eu-west-3), London (eu-west-2), Singapore (ap-southeast-1) and Mumbai (ap-south-1). It can be used on the console or using the SDK. First, let's use it from the console.

Try using Amazon Textract from the console

Now I want to check what the accuracy is, so let's use it. This is the console screen. キャプチャ.PNG Looking at the accuracy of the sample, it looks pretty good. I think it's amazing that you can read this far even though it is handwritten. Where it says better app, tt looks like H, and a looks like o. I think you can read it well. Now, let's throw an image of Ring Fit and check it. Just drag and drop the image and it will OCR.

キャプチャ.PNG

Hmmm, it doesn't seem to support Japanese ... However, it seems that the numerical values ​​and other parts can be read properly. If you do the post-processing properly, you may be able to create data more accurately than last time.

Try using Amazon Textract from python

Now I would like to use Textract from python.

Install awscli and boto3 for use with python

console


pip install awscli
pip install boto3

Set the iam user to be used for awscli. Use the access key and secret access key that are issued when you create a user from iam.

console


aws configure

AWS Access Key ID [None]: your access key
AWS Secret Access Key [None]: your secret access key
Default region name [None]: your region
Default output format [None]: your format

You may not need to set the region and format.

The code was based on the documentation → Boto3 Docs 1.16.37 Documentation-Textract The contents of the code are as follows.

  1. Preparing to use textract
  2. Image loading
  3. Perform OCR with textract (synchronous processing)
  4. Display the result from the returned data

textract.py


import boto3

# Amazon Textract client
textract = boto3.client('textract', region_name="ap-southeast-1")

# read image to bytes
with open('get_data/2020-09-28.png', 'rb') as f:
    data = f.read()

# Call Amazon Textract
response = textract.detect_document_text(
    Document={
        'Bytes': data
    }
)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

Let's actually execute it. It took about 2 seconds in my environment.

console


python .\textract.py
R
Oti
10+29
38.40kcal
0.89km
Oxian

Looking at the execution result, it seems that the result is the same as the execution of the console.

What will be returned as the execution result?

Earlier, the execution result was extracted from the response and displayed as the document, but what kind of content will be returned in the first place? I would like to check the contents of the response.

response.json


{
    "DocumentMetadata": {
        "Pages": 1
    },
    "Blocks": [
        {
            "BlockType": "PAGE",
            "Geometry": {
                "BoundingBox": {
                    "Width": 1.0,
                    "Height": 0.9992592334747314,
                    "Left": 0.0,
                    "Top": 0.0
                },
                "Polygon": [
                    {
                        "X": 6.888638380355447e-17,
                        "Y": 0.0
                    },
                    {
                        "X": 1.0,
                        "Y": 0.0
                    },
                    {
                        "X": 1.0,
                        "Y": 0.9992592334747314
                    },
                    {
                        "X": 0.0,
                        "Y": 0.9992592334747314
                    }
                ]
            },
            "Id": "33a0a9cd-0569-44ed-9f0f-7e88ede1d3d3",
            "Relationships": [
                {
                    "Type": "CHILD",
                    "Ids": [
                        "b9b8fd8e-1f13-4b9a-8bfa-8c8ca4750ae0",
                        "3b71c094-0bac-496e-9e26-1d311b89a66c",
                        "366cdb0a-5d10-4f64-b88b-c1ad79013fc2",
                        "232492f4-3137-49df-ad21-0369622cc56e",
                        "738b30df-4472-4a25-90fe-eaed85e74566",
                        "a73953ed-6038-49fb-af64-bad77e0d1e8f"
                    ]
                }
            ]
        },
        {
            "BlockType": "LINE",
            "Confidence": 87.06179809570312,
            "Text": "R",
            "Geometry": {
                "BoundingBox": {
                    "Width": 0.008603394031524658,
                    "Height": 0.018224462866783142,
                    "Left": 0.7822862863540649,
                    "Top": 0.1344471424818039
                },
                "Polygon": [
                    {
                        "X": 0.7822862863540649,
                        "Y": 0.1344471424818039
                    },
                    {
                        "X": 0.7908896803855896,
                        "Y": 0.1344471424818039
                    },
                    {
                        "X": 0.7908896803855896,
                        "Y": 0.15267160534858704
                    },
                    {
                        "X": 0.7822862863540649,
                        "Y": 0.15267160534858704
                    }
                ]
            },
            "Id": "b9b8fd8e-1f13-4b9a-8bfa-8c8ca4750ae0",
            "Relationships": [
                {
                    "Type": "CHILD",
                    "Ids": [
                        "1efd9875-d6a4-45e4-8fb4-63e68c668ff1"
                    ]
                }
            ]
        },
        ...
        {
            "BlockType": "WORD",
            "Confidence": 87.06179809570312,
            "Text": "R",
            "TextType": "PRINTED",
            "Geometry": {
                "BoundingBox": {
                    "Width": 0.008603399619460106,
                    "Height": 0.018224479630589485,
                    "Left": 0.7822862863540649,
                    "Top": 0.1344471424818039
                },
                "Polygon": [
                    {
                        "X": 0.7822862863540649,
                        "Y": 0.1344471424818039
                    },
                    {
                        "X": 0.7908896803855896,
                        "Y": 0.1344471424818039
                    },
                    {
                        "X": 0.7908896803855896,
                        "Y": 0.15267162024974823
                    },
                    {
                        "X": 0.7822862863540649,
                        "Y": 0.15267162024974823
                    }
                ]
            },
            "Id": "1efd9875-d6a4-45e4-8fb4-63e68c668ff1"
        },
        {
            "BlockType": "WORD",
            "Confidence": 37.553348541259766,
            "Text": "Oti",
            "TextType": "HANDWRITING",
            "Geometry": {
                "BoundingBox": {
                    "Width": 0.03588677942752838,
                    "Height": 0.031930990517139435,
                    "Left": 0.4896482229232788,
                    "Top": 0.2779926359653473
                },
                "Polygon": [
                    {
                        "X": 0.4896482229232788,
                        "Y": 0.2779926359653473
                    },
                    {
                        "X": 0.525534987449646,
                        "Y": 0.2779926359653473
                    },
                    {
                        "X": 0.525534987449646,
                        "Y": 0.30992361903190613
                    },
                    {
                        "X": 0.4896482229232788,
                        "Y": 0.30992361903190613
                    }
                ]
            },
            "Id": "4e07e16b-f78b-4564-bb30-c0e48f6610c6"
        },
        ...
    ],
    "DetectDocumentTextModelVersion": "1.0",
    "ResponseMetadata": {
        "RequestId": "87f05420-f6d9-4e67-911e-64deadd207fb",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "x-amzn-requestid": "87f05420-f6d9-4e67-911e-64deadd207fb",
            "content-type": "application/x-amz-json-1.1",
            "content-length": "6693",
            "date": "Thu, 17 Dec 2020 00:36:14 GMT"
        },
        "RetryAttempts": 0
    }
}

The above is the actual contents. I would like to confirm while looking at the document.

key val
DocumentMetadata Document metadata. This time, 1 page is returned.
Blocks Items detected and analyzed by AnalyzeDocument. The result of OCR is coming in.
BlockType The type of text item that is recognized. There seem to be several types. I will summarize only the contents that came out this time.
Page: A list of detected LINE block objects. The ID of the recognized character was stored.
Word:The detected word. Judgment such as handwriting or printing was also written.
LINE:A string of detected tab-delimited consecutive words. I think it will contain a sentence of data.

Is this much data needed? This time, I think that BlockType in Blocks should extract what is needed from the data of Word. So how do you get it out?

Try to retrieve only the necessary data.

Looking at the returned value, it seems that the position of the read data is written. All RingFit data will be in the same format, so the range of characters to read should be about the same. The ring fit data is bottom right aligned, so the bottom right coordinates should be roughly the same. So I would like to get the data near specific coordinates.

Follow the steps below.

  1. Create the lower right coordinate data corresponding to the character data from the previous data
  2. Get the data of the coordinate position of the data you want to get
  3. Store in each data

Although it is data with specific coordinates, I tried to allow an error of 0.01 in consideration of the deviation. The json loaded at runtime is the above-mentioned Textract response data.

textract.py


import json


#Formatted to data with only characters and lower right coordinates
def get_word(data: dict) -> dict:
    words = []
    for item in data["Blocks"]:
        if item["BlockType"] == "WORD":
            words.append({
                "word":item["Text"],
                "right_bottom":item["Geometry"]["Polygon"][2]
            })
    return words

#Lower right coordinates are near specific(Misalignment 0.Allowed up to 01)Judgment
def point_check(x: float, y: float) -> dict:
    origin_point = {
        "time":{"x":0.71,"y":0.46},
        "kcal":{"x":0.73,"y":0.63},
        "km":{"x":0.73,"y":0.78}
    }
    for k, v in origin_point.items():
        if abs(x-v["x"])<0.01 and abs(y-v["y"])<0.01:
            return k
     

def get_point_data(data: dict) -> dict:
    prepro_data = get_word(data)
    some_data = {}
    for v in prepro_data:
        tmp = point_check(v["right_bottom"]["X"], v["right_bottom"]["Y"])
        if tmp:
            some_data[tmp] = v["word"]
    return some_data


if __name__ == '__main__':
    with open("j.json") as f:
        data = json.load(f)
    d = get_point_data(data)
    print(d)

When I run it ...

console


python .\textract.py
{'time': '10+29', 'kcal': '38.40kcal', 'km': '0.89km'}

It seems that it is taken properly.

Try with multiple images

Then, there are some images of Ring Fit, so I would like to try it. Execute the image loading part multiple times and check the result of OCR of multiple images. (Code omitted) The result is below.

res_list.json


[
    {
        "time": "27",
        "kcal": "48kcal",
        "km": "0.71km"
    },
    {
        "time": "11>12*",
        "kcal": "37.79kcal",
        "km": "O.65km"
    },
    {
        "kcal": "36.62kcal",
        "km": "0.23km"
    },
    ...
]

There is some data that distinguishes between 0 and o, and the time cannot be read, but it seems that it can be read in general. (The time data includes Japanese, so it can't be helped.) I would like to fill the data with no time with 0, and use the value of the first 2 digits for other data. I would like to perform post-processing including replacement of 0 and o. Outliers (40 minutes or more) are set to 1/10 because time data may not be processed well. By the way, I also added date data so that I can use it for the previous graph creation. (Outside the code)

textract.py


def post_processing(word_point_list: list):
    for data in word_point_list:
        if "time" not in data:
            data["time"] = "0"
        re_data = re.sub('[^0-9]','', data["time"])
        if len(re_data) < 2:
            re_data = re_data[:1]
        else:
            re_data = re_data[:2]
        data["time"] = float(re_data) if float(re_data) < 40 else float(re_data)/10
        data["kcal"] = float(data["kcal"].replace("o","0").replace("O","0").replace("k","").replace("c","").replace("a","").replace("l",""))
        data["km"] = float(data["km"].replace("o","0").replace("O","0").replace("k","").replace("m",""))
    return word_point_list

When I run it with this ...

res_list.json


[
    {
        "time": 27.0,
        "kcal": 48.0,
        "km": 0.71,
        "date": "2020-11-09.png "
    },
    {
        "time": 11.0,
        "kcal": 37.79,
        "km": 0.65,
        "date": "2020-11-15.png "
    },
    {
        "kcal": 36.62,
        "km": 0.23,
        "date": "2020-11-16.png ",
        "time": 0.0
    },
    ...
]

It looks like it's clean!

Let's actually operate

Then, I would like to perform OCR of the image, preprocessing, and then display the graph. Please refer to the previous article for the functions you are using.

ocr_and_graph.py


import json
import os

from src.textract import do_ocr, get_point_data, post_processing
from src.graph import create_graph

IMPORT_FILE_PATH = "output/ocr_result.json"
OUTPUT_FILE_PATH = "output/graph2.png "
if __name__ == "__main__":
    data = do_ocr("./get_data")
    word_point_list = []
    for word_dict in data:
        word_point_list.append(get_point_data(word_dict))
    word_point_list = post_processing(word_point_list)
    with open("./output/j.json", "w") as f:
        json.dump(word_point_list, f)
    
    #Data creation and output from DL image files
    try:
        os.makedirs(IMPORT_FILE_PATH.replace(IMPORT_FILE_PATH.split("/")[-1], ""))
    except:
        pass
    with open(IMPORT_FILE_PATH, "w") as f:
        json.dump(word_point_list, f, indent=2)
    
    create_graph(IMPORT_FILE_PATH, OUTPUT_FILE_PATH)

I would like to compare the graph created this time with the graph created last time. It can be seen that the outliers of time and kcal have decreased compared to the previous time. After all, outliers are seen in the time data, so it may be better to deal with it by preprocessing or changing the language of the game. However, the kcal data is almost correct, so I think it is still useful. Besides, this time, the image is processed with this accuracy without preprocessing, so I felt that it was very easy to use.

Last time graph.png
this time graph2.png

This is the end of the main subject.

Code review with CodeGuru

There is a service called CodeGuru on AWS. It is a service that reviews the code, but now that Python is a supported language, I would like to try it. First, link the code you want to review. I did it from GitHub. キャプチャ.PNG

After adding, select the repository and branch you want to analyze from "Create repository analysis". It took a few minutes to run. (Is it about 10 minutes?)

capture.PNG capture.PNG

I would like to see the execution result. I think I've only seen the first one. Apparently, exception handling is not written in detail, so it is better to write it concretely. When I actually go to see the code, as shown below, only except is specified and the error content is not specified.

キャプチャ.PNG

create_fit_data.py


 #DL from the acquired image URL(The file name is the tweet date and time)
    for data in image_url_list:
        try:
            os.mkdir("get_data")
        except:
            pass
        dst_path = f"get_data/{data['created_at'].strftime('%Y-%m-%d')}.png "
        download_file(data['img_url'],dst_path)

In this way, using CodeGuru seems to help make the environment where bugs are likely to occur and debug easily. I think python is used in many places, so I think there are many opportunities to use it. If you want to co-develop or write solid code, you may want to use CodeGuru.

Summary

This time I used Textract and CodeGuru to do something like the last rework. After trying a few times, Textract was free to use up to 1000 pages a month for the first 3 months, so I was able to create it at no cost. It's very helpful for startups.

CodeGuru is also free to use for 3 months, and after that it seems to be $ 0.50 for every 100 lines of code until you analyze 1,500,000 lines of code each month.

By the way, the amount of code I wrote this time was about 250, and the code-reviewed line was displayed as 187. Maybe you read only the necessary parts.

I want Textract to support Japanese ... I hope it will be easier if you do it, but I'm looking forward to it in the future!

Recommended Posts

[Python] Try to create ring fit data using Amazon Textract [OCR] (Try code review with Code Guru)
[Python] Try to graph from the image of Ring Fit [OCR]
Try to create a python environment with Visual Studio Code & WSL
Code review with machine learning Amazon Code Guru now supports Python so I tried it
Try to analyze online family mahjong using Python (PART 1: Take DATA)
Try to solve the shortest path with Python + NetworkX + social data
Try to get CloudWatch metrics with re: dash python data source
Try to operate Facebook with Python
Try using Amazon DynamoDB from Python
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
Create a tool to automatically furigana with html using Mecab from Python3
Try to image the elevation data of the Geographical Survey Institute with Python
Steps to create a Python virtual environment with VS Code on Windows
Data integration from Python app on Windows to Amazon Redshift with ODBC
[Python] [Word] [python-docx] Try to create a template of a word sentence in Python using python-docx
Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
Try to poke DB on IBM i with python + JDBC using JayDeBeApi
Try to reproduce color film with Python
Try mathematical formulas using Σ with python
Retrieving food data with Amazon API (Python)
Try working with binary data in Python
Convert Excel data to JSON with python
Try using Python with Google Cloud Functions
Convert FX 1-minute data to 5-minute data with Python
Try to operate Excel using Python (Xlwings)
Try converting to tidy data with pandas
Record your Ring Fit Adventure with OCR
Try to solve the traveling salesman problem with a genetic algorithm (Python code)
Convert csv, tsv data to matrix with python --using MovieLens as an example
Try to create a Todo management site using WebSocket with Django (Swamp Dragon)
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 2: Import data to SQL Server using PowerShell
How to create sample CSV data with hypothesis
Try using django-import-export to add csv data to django
Try to aggregate doujin music data with pandas
Try to solve the man-machine chart with Python
Try to draw a life curve with python
Debug with VS Code using boost python numpy
I tried to get CloudWatch data with Python
Try to make a "cryptanalysis" cipher with Python
Try to automatically generate Python documents with Sphinx
Steps to create a Twitter bot with python
Try to make a dihedral group with Python
Write CSV data to AWS-S3 with AWS-Lambda + Python
Try to create an HTTP server using Node.js
Try to detect fish with python + OpenCV2.4 (unfinished)
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
Try to create a waveform (audio spectrum) that moves according to the sound with python
I tried scraping food recall information with Python to create a pandas data frame