Overview

Recently, I enjoy playing with my family on the online mahjong site Tenhou once a week for about 2 hours.

This time, I will explain the analysis of the competition results so that everyone in the family can see it on the web.

As a process:

Analyze with Python
Visualization using Dash
Debroy using Heroku and Github

Finished like this ↓ https://drmahjong.herokuapp.com/

The Code can be found on Github below. https://github.com/mottoki/mahjan-score

Take data

Tenhou data can be obtained from Log. Take the data using a Module called request.

`python.py`


import requests
import datetime

new_date = datetime.datetime.now().strftime('%Y%m%d')
url = "https://tenhou.net/sc/raw/dat/"+f"sca{new_date}.log.gz"
filename = f"sca{new_date}.log.gz"

# Download gz file from the url
with open(filename, "wb") as f:
    r = requests.get(url)
    f.write(r.content)

Data processing using Python and Pandas

Filter the raw data by player's name and use split to extract a data frame with only the player's name and points.

`new_data.py`


import os
import pickle
import pandas as pd

#Player name
playercol = ['date', 'Mirataro', 'Shinwan', 'ToShiroh', 'yukoron']

#Convert to Pandas dataframe
df = pd.read_csv(filename, usecols=[0], error_bad_lines=False, header=None)
df[len(df.columns)] = new_date

#Filter by player name
df = df[(df[0].str.contains(playercol[1])) & 
    (df[0].str.contains(playercol[2])) & 
    (df[0].str.contains(playercol[3])) &
    (df[0].str.contains(playercol[4]))]

#Process the data frame
df[['one','two','three','four']] = df[0].str.split('|', 3, expand=True)
df.columns = ['original', 'date', 'room', 'time', 'type', 'name1']
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df[['empty', 'n1', 'n2', 'n3', 'n4']] = df.name1.str.split(" ", n=4, expand=True)
#Use only important columns
df = df[['date', 'n1', 'n2', 'n3', 'n4']]

#Remove the key brackets attached to the score and name to create a data frame
new_score = pd.DataFrame(columns=playercol)
k=0
for i, j in df.iterrows():
   dd = j[0]
   new_score.loc[k, 'date'] = dd
   for name in df.columns[1:]:
       s = j[name]
       player = s.split('(')[0]
       score = [p.split(')')[0] for p in s.split('(') if ')' in p][0]
       score = int(float(score.replace('+', '')))
       new_score.loc[k, player] = score
   k += 1

#Call old data from Pickle
current_dir = os.getcwd()
old_score = pd.read_pickle(f"{current_dir}/players_score.pkl")

#Combine new and old data
concat_score = pd.concat([old_score, new_score], ignore_index=True)
concat_score.to_pickle(f"{current_dir}/players_score.pkl")

Visualize with Dash

Use a library called Dash to quickly visualize data.

The Dash tutorial is the easiest to understand. (Reference: Dash Documentation & User Guide)

The part that gets caught in Dash is a function called Callback, but there are people who explain it in detail such as Use Python visualization library Dash 2 See Callback. Please refer to that.

1. Front side (what you see on the web)

It would be long to explain all the code, so I will explain the core part as an example.

Basically everything in the first app.layout ＝ is what you see on your website.

Items that you do not want to display (for example, data that you use many times "intermediate-values") can be hidden on the web by entering style = {'display':'none'}.


#Write the front end in this
app.layout = html.Div([
    #Allow users to choose the date that will be reflected in the data
    html.Div([
        html.H2("DR.Mahjong"),
        dcc.DatePickerRange(
            id='my-date-picker-range',
            min_date_allowed=dt(2020, 3, 1),
            max_date_allowed=dt.today(),
            end_date=dt.today()
        ),
        ], className="mytablestyle"),

    #Data that is used many times:style={'display': 'none'}Make it invisible
    html.Div(id='intermediate-value', style={'display': 'none'}),

    #Transition of points (graph)
    dcc.Graph(id='mygraph'),

    #Comprehensive points (table)
    html.Div([
            html.Div(html.P('Current total points')),
            html.Div(id='totalscore'),
        ], className="mytablestyle"),

])

2. Jsonize the data with Callback

Read the data with pandas read_pickle, filter by date, jsonize and return.

This will allow you to use the same data over and over again in graphs and tables.

@app.callback(Output("intermediate-value", "children"),
    [Input("my-date-picker-range", "start_date"),
    Input("my-date-picker-range", "end_date")])
def update_output(start_date, end_date):
    players = pd.read_pickle('players_score.pkl')
    if start_date is not None:
        start_date = dt.strptime(re.split('T| ', start_date)[0], '%Y-%m-%d')
        players = players.loc[(players['date'] >= start_date)]
    if end_date is not None:
        end_date = dt.strptime(re.split('T| ', end_date)[0], '%Y-%m-%d')
        players = players.loc[(players['date'] <= end_date)]
    return players.to_json(date_format='iso', orient='split')

3. Graph & table data with Callback

Return the jsonized data to the Pandas data frame and graph and tabulate it.

The graphing takes over the Plotly style and is represented by go.Figure ().

Tableization is represented by html.Table. There is also a library called dash_table for tables, but this time the table was simple, so I decided to use this style because I didn't need it.

@app.callback([Output('mygraph', 'figure'),
    Output('totalscore', 'children')],
    [Input('intermediate-value', 'children'),
    Input('datatype', 'value')])
def update_fig(jsonified_df, data_type):
    #Restore the Jsonized data to Pandas.
    players = pd.read_json(jsonified_df, orient='split')

    #Graphing
    fig = go.Figure()
    for i, name in enumerate(players.columns[1:]):
        fig.add_trace(go.Scatter(x=players.date, 
                            y=np.array(players[name]).cumsum(),
                            mode='lines',
                            name=name,
                            line=dict(color=colors[i], width=4)))

    fig.update_layout(plot_bgcolor='whitesmoke',
        title='Transition of total points',
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1,)
    )

    #Calculate total points
    summed = players.sum()

    #Returns graphs and tables
    return fig, html.Table([
        html.Thead(
            html.Tr([html.Th(col) for col in summed.index])
            ),
        html.Tbody(
            html.Tr([html.Td(val) for val in summed])
            ),
        ])

Deploy using Heroku and Github

Finally, we will deploy using Heroku and Github.

The official website (Deploying Dash Apps) has detailed instructions on how to do Git and Heroku, so the methods are almost the same.

The process looks like this:

Sign up for a Github account
Create a new repository on Github
SSH to Github. (Optional, but easier to do. See: Allow ssh connection to GitHub)
Create the files (.ignore, Procfile, requirements.txt) required for deployment. You also need gunicorn, so install it with pip install gunicorn.

Use the Git command to push the above files and the ʻapp.pyandplayers_score.pkl` data files to Github.

git init
git add .
git commit -m "message"
git remote add origin [email protected]:<username>/<Repository name>
git push origin master

After confirming that it has been pushed to Github, create a Heroku account and create a new app with the New> create new app button (Region does not have Japan, so select United States).
Click the Deploy tab of the created app, set Deployment method to Github, and connect to the repository created in 2.
Finally, press the black button Deploy Branch in Manual deploy to deploy it without permission.

Finally

How was it?

You can also use cron and Heroku's Automatic Deploy to automate updates to new data from Tenhou. (Reference: Automate the process of pushing to Github with cron)

reference

-Create an automatic grade management app for Tenhou private room with LINE bot and Python

Try to analyze online family mahjong using Python (PART 1: Take DATA)