I thought I had to exercise the other day, so I bought a ring fit because of the trend. About two months later, I feel like I haven't done much (I used to do it every day, but now it's about once a week ...) This is fine! I have to keep my motivation! That's why I wanted to visualize the amount of exercise and the amount of exercise time.

What are you doing?

So, what I want to do is to acquire and graph the movement data of Ring Fit. I wondered if there was a Ring Fit API, but it doesn't seem to be. So I would like to upload the image to twitter and get the data from there.

Get Ring Fit tweets from twitter
Image DL from Tweet
Extract data from acquired image (OCR)
Create a graph from the extracted data

I think I'll do it like this.

environment

The execution environment.

environment	ver
OS	windows10
python	3.7

Bring data from twitter

So first I want to bring the data from twitter. I think there are probably two ways to scrape or hit the API. This time we will do it with API. So, apply for twitter API. → Here You have to register your phone number with your twitter account, so let's register in advance (it's stuck here) I need to write the application details in English, but I had the google teacher translate it and copy and paste it. As expected it is a teacher.

Let's get the image. When you send an image from swich, the hashtag will be accompanied by the name of the game you took. So, I'm going to filter by "Account" and "#Ring Fit Adventure" to get the data of "From today to X days ago".

Filter by account and hashtag
Data acquisition from today to X days ago
DL the image

First, I used tweepy to use the twitter API.

`console`


pip install tweepy

Initialize tweepy

Initialize tweepy. I made a function to put the necessary data in json and initialize it from there. I don't think there is a problem with solid writing. I put the path of config.json in the argument and made it feel like using it.

`config.json`


{
    "CONSUMER_KEY": "your key",
    "CONSUMER_SECRET": "your secret key",
    "ACCESS_TOKEN": "your token",
    "ACCESS_TOKEN_SECRET": "your secret token"
}

`twitter_tl.py`



#Initialization of tweepy
def init_twitter_api(config: dict) -> object:
    auth = tweepy.OAuthHandler(config['CONSUMER_KEY'], config['CONSUMER_SECRET'])
    auth.set_access_token(config['ACCESS_TOKEN'], config['ACCESS_TOKEN_SECRET'])
    return tweepy.API(auth)

Next, let's get the data from twitter.

Get specific data from TL

I wrote the data acquisition in one function. It is a function that will bring you the data if you pass the user name and search string and enter the number of months ago. The first argument is the initialized API object. As for what I'm doing

Get user's TL data
Calculate how long data will be acquired (last month)
Repeat 4 until a specific date
If the tweet contains a specific string, add the date and image URL to the list

is. The order of filtering and date data acquisition is messed up, but I chose this method because I couldn't download the image well if I used keywords such as since.

`twitter_tl.py`



#From twitter, end_Data acquisition up to date months ago
def get_img_data_from_TL(api: object, user_id: str, serch_text: str, end_date: int) -> object:
    image_url_list = []
    print(f"get {user_id}'s TL now...")
    search_results = tweepy.Cursor(api.user_timeline, screen_name=user_id).items()
    today = datetime.today()
    lastmonth = today - relativedelta(months=end_date)
    print(f"get [{serch_text}] until {lastmonth}")
    for i, result in enumerate(search_results):
        try:
            if i%50 == 0:
                print('.')
            if result.created_at < lastmonth:
                break
            if serch_text in result.text:
                image_url_list.append(
                    {
                        "created_at": result.created_at,
                        "img_url": result.extended_entities["media"][0]["media_url"]
                    }
                )
        except Exception as e:
            print(e)
    return image_url_list

Finally save the image from the URL.

Image download

If you specify the URL and the path (including the file name) of the download destination, the created function will download the image there.

Open the URL
Save the image

It's simple.

`twitter_tl.py`



#Image DL
def download_file(url: str, dst_path: str) -> None:
   try:
       with urllib.request.urlopen(url) as web_file:
           data = web_file.read()
           with open(dst_path, mode="wb") as local_file:
               local_file.write(data)
   except urllib.error.URLError as e:
       print(e)

Then I would like to actually move it.

Try to run

If you look to the left, you can see that the image is generated.

`twitter_tl.py`


if __name__ == "__main__":
    user_id = "tumugi3205"
    serch_text = "Ring Fit Adventure"
    end_date = 3
    with open("config/config.json") as f:
        conf = json.load(f)
    api = init_twitter_api(conf)
    image_url_list = get_img_data_from_TL(api, user_id, serch_text, end_date)
    for data in image_url_list:
        dst_path = f"get_data/{data['created_at'].strftime('%Y-%m-%d')}.png "
        download_file(data['img_url'],dst_path)

キャプチャ.PNG

Now you can get the image from twitter!

Data acquisition from images

Next, I would like to extract the text data from the image. In python you can easily do OCR using Tesseract. I introduced it with reference to this article. → How to execute OCR in Python Use pyocr to use Tesseract in python.

`console`


pip install pyocr

Now let's get the data from the image!

Try OCR

For the time being, I tried OCR as it is.

`fit_image_ocr.py`


#Initialization of pyocr
def startup_ocr() -> Any:
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print("No OCR tool found")
        sys.exit(1)
    return tools[0]

if __name__ == "__main__":
    img = cv2.imread("get_data/2020-09-28.png ")
    tool = startup_ocr()
    lang = "jpn"
    print(tool.image_to_string(Image.open("get_data/2020-09-28.png "), lang=lang))

キャプチャ.PNG

Hmmm, this can't be used as data ... It looks like it needs to be preprocessed first.

Image preprocessing

In this image, there are three types: exercise time, calories burned, and distance traveled. The location is fixed every time, so let's extract only a specific position. Also, it seems that the numerical values cannot be read well in Japanese, so I would like to acquire the minute part of the time data only with the numerical values. Since the one-digit time is not a notation like 07 but a simple one-character notation, the range is separated in consideration of the deviation. (I did it by trial and error.) That is the function below.

Get the file
Specifying the cropping range
Output of cropped image
Output the output file information as a dictionary

`fit_image_ocr.py`



#Extract only the file name from path
def get_file_name(file_path: str) -> str:
    return file_path.split("/")[-1].split(".")[0]

#Preprocessing
def overview_preprosess(file_path: str) -> dict:
    file_name = get_file_name(file_path)
    img = Image.open(file_path)
    width_section = img.width/4
    height_section = img.height/6
    create_path = {}

    rect_dic = {
        "time": (600, 250, 770, 320),
        "kcal": (width_section*2, height_section*3, width_section*3,height_section*4),
        "km": (width_section*2, height_section*4, width_section*3,height_section*5)
    }

    for name, rect in rect_dic.items():
        try:
            os.mkdir("prepro")
        except:
            pass
        output_path = f"prepro/{file_name}_{name}.jpg "
        prepro = img.crop(box=(rect))
        prepro.save(output_path, format="jpeg")
        create_path[name] = output_path

    return create_path

Now that we have preprocessed, let's do OCR.

OCR

It's simple to do.

Initialization of pyocr
Get data from the file path output earlier
OCR

`fit_image_ocr.py`


#Data acquisition with OCR
def file_ocr(do_dir: list) -> list:
    ocr_list = []
    tool = startup_ocr()    
    for filename in os.listdir(do_dir):
        ocr_text_dict = {}
        output_path = overview_preprosess(f"{do_dir}/{filename}")
        for name, path in output_path.items():
            img = cv2.imread(path)
            lang = "eng"
            ocr_text_dict[name] = tool.image_to_string(Image.open(path), lang=lang)
        ocr_text_dict["read_file_name"] = filename

You can easily perform OCR. It's really easy. Let's see the result.

`ocr_result.json`


[
  {
    "time": "10+",
    "kcal": "38. AOkcal",
    "km": "O. 89km",
    "read_file_name": "2020-09-28.png "
  },
  {
    "time": "16",
    "kcal": "42. 69kcal",
    "km": "O. 59km",
    "read_file_name": "2020-09-30.png "
  },
  {
    "time": "223",
    "kcal": "7 ] 2 O9kcal",
    "km": "1. 29km",
    "read_file_name": "2020-10-01.png "
  },
  {
    "time": "1445",
    "kcal": "Af Bike",
    "km": "1. 03km",
    "read_file_name": "2020-10-04.png "
  }
  ...
]

No ... this doesn't seem to work either. So let's do the post-processing as well.

Data post-processing

Replace the range where the cause can be found by the wrong judgment (1 is judged as], which is an impossible numerical value), and if you do not know, set it to error. It can not be helped.

Replace as far as you can see
Delete the string
Get only the first two digits of time
For distance, put a decimal point before the last two digits if there is no minority point in calories burned.
If the created data exceeds 200, make it error (because it does not exercise so much)
If the created data cannot be converted to a numerical value, use error.

`fit_image_ocr.py`


#Post-processing
def post_processing(ocr_text_dict: dict) -> dict:
    for name, text in ocr_text_dict.items():                
        ocr_text_dict[name] = ocr_text_dict[name].replace("A", "4").replace("．", ".").replace("Zu", "2.").replace("o", "0").replace(" ", "").replace("]", "1")
        ocr_text_dict[name] = re.sub("[a-zA-Z]","", ocr_text_dict[name])
        
        if name == "time":
            if len(ocr_text_dict[name])>2:
                ocr_text_dict[name] = ocr_text_dict[name][:2]
        if  len(ocr_text_dict[name])>4 and "." not in ocr_text_dict[name]:
            ocr_text_dict[name] = f"{ocr_text_dict[name][:len(ocr_text_dict[name])-2]}.{ocr_text_dict[name][-2:]}"

        try:
            ocr_text_dict[name] = float(ocr_text_dict[name])
            if ocr_text_dict[name] > 200:
                raise Exception
        except:
            if name != "read_file_name":
                ocr_text_dict[name] = "error"
    return ocr_text_dict

Let's do this after the previous OCR. (read_file_name does not need preprocessing, so add it at the end.) Then ...

`ocr_result.json`


[
  {
    "time": 10.0,
    "kcal": 38.4,
    "km": 0.89,
    "read_file_name": "2020-09-28.png "
  },
  {
    "time": 16.0,
    "kcal": 42.69,
    "km": 0.59,
    "read_file_name": "2020-09-30.png "
  },
  {
    "time": 22.0,
    "kcal": "error",
    "km": 1.29,
    "read_file_name": "2020-10-01.png "
  },
  {
    "time": 14.0,
    "kcal": 4.0,
    "km": 1.03,
    "read_file_name": "2020-10-04.png "
  }
  ...
]

It looks like it's improved a lot! Let's make a graph from this output data.

Creating a graph

Let's easily create a graph using matplotlib.

`console`


pip install matplotlib

This time I would like to display two graphs of exercise time and calories burned.

Read data
Format the data for the graph
For error, set the value to 0
Graph setting of calories burned
Exercise time graph setting
Output

It seems that you can draw various graphs with matplotlib, but this time we will display two line graphs. I will omit the details,

Determine the size of the graph (font, etc.)
Determine the origin (probably)
Add data
Add header or grid
Save It is a flow.

`graph.py`


import json

import matplotlib.pyplot as plt

def create_graph(input_path: str, output_path: str):
    with open(input_path) as f:
        data = json.load(f)

    date = []
    kcal = []
    time = []
    for val in data:
        date.append(val["read_file_name"].replace("2020-", "").replace(".png ", ""))
        kcal.append(val["kcal"])
        time.append(val["time"])

    kcal = [float(str(k).replace("error", "0")) for k in kcal]
    time = [float(str(t).replace("error", "0")) for t in time]

    
    fig = plt.figure(figsize=(15, 5))
    ax1 = fig.add_subplot(111)
    ln1=ax1.plot(date, kcal,'C0',label=r'kcal')
    ax2 = ax1.twinx()
    ln2=ax2.plot(date, time,'C1',label=r'time')
    h1, l1 = ax1.get_legend_handles_labels()
    h2, l2 = ax2.get_legend_handles_labels()
    ax1.legend(h1+h2, l1+l2, loc='lower right')
    ax1.set_xlabel('date')
    ax1.set_ylabel(r'kcal')
    ax1.grid(True)
    ax2.set_ylabel(r'time')
    plt.savefig(output_path)

With the above, we were able to acquire data from twitter and perform OCR to create a graph.

Try using

Let's actually execute it.

`create_fit_data.py`


import json
import os

from src.twitter_tl import init_twitter_api, get_img_data_from_TL, download_file
from src.fit_image_ocr import file_ocr
from src.graph import create_graph

USER_ID = "tumugi3205"
SERCH_TEXT = "Ring Fit Adventure"
END_MONTH = 3

IMPORT_FILE_PATH = "output/ocr_result.json"
OUTPUT_FILE_PATH = "output/graph.png "

if __name__ == "__main__":
    with open("config/config.json") as f:
        CONFIG = json.load(f)

    #twitter API settings
    api = init_twitter_api(CONFIG)

    #Get the tweet with the specific wording from the TL of the specific user, and get the URL of the first image in it
    image_url_list = get_img_data_from_TL(api, USER_ID, SERCH_TEXT, END_MONTH)

    #DL from the acquired image URL(The file name is the tweet date and time)
    for data in image_url_list:
        try:
            os.mkdir("get_data")
        except:
            pass
        dst_path = f"get_data/{data['created_at'].strftime('%Y-%m-%d')}.png "
        download_file(data['img_url'],dst_path)
    
    #Data creation and output from DL image files
    ocr_data = file_ocr("./get_data")
    try:
        os.makedirs(IMPORT_FILE_PATH.replace(IMPORT_FILE_PATH.split("/")[-1], ""))
    except:
        pass
    with open(IMPORT_FILE_PATH, "w") as f:
        json.dump(ocr_data, f, indent=2)
    
    create_graph(IMPORT_FILE_PATH, OUTPUT_FILE_PATH)

did it!

Summary

This time I got a ring fit image from twitter and converted it to data with OCR to create a graph. Tesseract is free and easy, but it seems that the accuracy is not so good. You can see the importance of pre-processing and post-processing. Actually, only numerical data needs to be extracted, so I think that if you create all the image data from 1 to 9 and perform image processing, you can get proper data. (I'm not good at image processing, so I probably won't do it.) When I OCR with google docs, it was made with considerable accuracy, so if you are paying, it may be better to OCR with google API. I also tried using AWS's Amazon Textract, but I gave up because it didn't support Japanese. If it is the data after preprocessing, it predicted with considerable accuracy, but it is still a charge, so the hurdle is a little high. It's about 0.16 yen per execution, so it seems cheaper if you put together the image data like a table and perform OCR at once. (Amazon Textract seems to support tabular format as well.) When making this a service, I would like to consider introducing Amazon Textract.

[Python] Try to graph from the image of Ring Fit [OCR]

What are you doing?

environment

Bring data from twitter

console

Initialize tweepy

config.json

twitter_tl.py

Get specific data from TL

twitter_tl.py

Image download

twitter_tl.py

Try to run

twitter_tl.py

Data acquisition from images

console

Try OCR

fit_image_ocr.py

Image preprocessing

fit_image_ocr.py

OCR

fit_image_ocr.py

ocr_result.json

Data post-processing

fit_image_ocr.py

ocr_result.json

Creating a graph

console

graph.py

Try using

create_fit_data.py

Summary

`console`

`config.json`

`twitter_tl.py`

`twitter_tl.py`

`twitter_tl.py`

`twitter_tl.py`

`console`

`fit_image_ocr.py`

`fit_image_ocr.py`

`fit_image_ocr.py`

`ocr_result.json`

`fit_image_ocr.py`

`ocr_result.json`

`console`

`graph.py`

`create_fit_data.py`