Try to analyze online family mahjong using Python (PART 1: Take DATA)

Overview

Recently, I enjoy playing with my family on the online mahjong site Tenhou once a week for about 2 hours.

This time, I will explain the analysis of the competition results so that everyone in the family can see it on the web.

As a process:

  1. Analyze with Python
  2. Visualization using Dash
  3. Debroy using Heroku and Github

Finished like this ↓ https://drmahjong.herokuapp.com/

drmahjangsc1.png drmahjangsc2.png drmahjangsc3.png

The Code can be found on Github below. https://github.com/mottoki/mahjan-score

Take data

Tenhou data can be obtained from Log. Take the data using a Module called request.

python.py


import requests
import datetime

new_date = datetime.datetime.now().strftime('%Y%m%d')
url = "https://tenhou.net/sc/raw/dat/"+f"sca{new_date}.log.gz"
filename = f"sca{new_date}.log.gz"

# Download gz file from the url
with open(filename, "wb") as f:
    r = requests.get(url)
    f.write(r.content)

Data processing using Python and Pandas

Filter the raw data by player's name and use split to extract a data frame with only the player's name and points.

new_data.py


import os
import pickle
import pandas as pd

#Player name
playercol = ['date', 'Mirataro', 'Shinwan', 'ToShiroh', 'yukoron']

#Convert to Pandas dataframe
df = pd.read_csv(filename, usecols=[0], error_bad_lines=False, header=None)
df[len(df.columns)] = new_date

#Filter by player name
df = df[(df[0].str.contains(playercol[1])) & 
    (df[0].str.contains(playercol[2])) & 
    (df[0].str.contains(playercol[3])) &
    (df[0].str.contains(playercol[4]))]

#Process the data frame
df[['one','two','three','four']] = df[0].str.split('|', 3, expand=True)
df.columns = ['original', 'date', 'room', 'time', 'type', 'name1']
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df[['empty', 'n1', 'n2', 'n3', 'n4']] = df.name1.str.split(" ", n=4, expand=True)
#Use only important columns
df = df[['date', 'n1', 'n2', 'n3', 'n4']]

#Remove the key brackets attached to the score and name to create a data frame
new_score = pd.DataFrame(columns=playercol)
k=0
for i, j in df.iterrows():
   dd = j[0]
   new_score.loc[k, 'date'] = dd
   for name in df.columns[1:]:
       s = j[name]
       player = s.split('(')[0]
       score = [p.split(')')[0] for p in s.split('(') if ')' in p][0]
       score = int(float(score.replace('+', '')))
       new_score.loc[k, player] = score
   k += 1

#Call old data from Pickle
current_dir = os.getcwd()
old_score = pd.read_pickle(f"{current_dir}/players_score.pkl")

#Combine new and old data
concat_score = pd.concat([old_score, new_score], ignore_index=True)
concat_score.to_pickle(f"{current_dir}/players_score.pkl")

Visualize with Dash

Use a library called Dash to quickly visualize data.

The Dash tutorial is the easiest to understand. (Reference: Dash Documentation & User Guide)

The part that gets caught in Dash is a function called Callback, but there are people who explain it in detail such as Use Python visualization library Dash 2 See Callback. Please refer to that.

1. Front side (what you see on the web)

It would be long to explain all the code, so I will explain the core part as an example.

Basically everything in the first app.layout = is what you see on your website.

Items that you do not want to display (for example, data that you use many times "intermediate-values") can be hidden on the web by entering style = {'display':'none'}.


#Write the front end in this
app.layout = html.Div([
    #Allow users to choose the date that will be reflected in the data
    html.Div([
        html.H2("DR.Mahjong"),
        dcc.DatePickerRange(
            id='my-date-picker-range',
            min_date_allowed=dt(2020, 3, 1),
            max_date_allowed=dt.today(),
            end_date=dt.today()
        ),
        ], className="mytablestyle"),

    #Data that is used many times:style={'display': 'none'}Make it invisible
    html.Div(id='intermediate-value', style={'display': 'none'}),

    #Transition of points (graph)
    dcc.Graph(id='mygraph'),

    #Comprehensive points (table)
    html.Div([
            html.Div(html.P('Current total points')),
            html.Div(id='totalscore'),
        ], className="mytablestyle"),

])

2. Jsonize the data with Callback

Read the data with pandas read_pickle, filter by date, jsonize and return.

This will allow you to use the same data over and over again in graphs and tables.

@app.callback(Output("intermediate-value", "children"),
    [Input("my-date-picker-range", "start_date"),
    Input("my-date-picker-range", "end_date")])
def update_output(start_date, end_date):
    players = pd.read_pickle('players_score.pkl')
    if start_date is not None:
        start_date = dt.strptime(re.split('T| ', start_date)[0], '%Y-%m-%d')
        players = players.loc[(players['date'] >= start_date)]
    if end_date is not None:
        end_date = dt.strptime(re.split('T| ', end_date)[0], '%Y-%m-%d')
        players = players.loc[(players['date'] <= end_date)]
    return players.to_json(date_format='iso', orient='split')

3. Graph & table data with Callback

Return the jsonized data to the Pandas data frame and graph and tabulate it.

The graphing takes over the Plotly style and is represented by go.Figure ().

Tableization is represented by html.Table. There is also a library called dash_table for tables, but this time the table was simple, so I decided to use this style because I didn't need it.

@app.callback([Output('mygraph', 'figure'),
    Output('totalscore', 'children')],
    [Input('intermediate-value', 'children'),
    Input('datatype', 'value')])
def update_fig(jsonified_df, data_type):
    #Restore the Jsonized data to Pandas.
    players = pd.read_json(jsonified_df, orient='split')

    #Graphing
    fig = go.Figure()
    for i, name in enumerate(players.columns[1:]):
        fig.add_trace(go.Scatter(x=players.date, 
                            y=np.array(players[name]).cumsum(),
                            mode='lines',
                            name=name,
                            line=dict(color=colors[i], width=4)))

    fig.update_layout(plot_bgcolor='whitesmoke',
        title='Transition of total points',
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1,)
    )

    #Calculate total points
    summed = players.sum()

    #Returns graphs and tables
    return fig, html.Table([
        html.Thead(
            html.Tr([html.Th(col) for col in summed.index])
            ),
        html.Tbody(
            html.Tr([html.Td(val) for val in summed])
            ),
        ])

Deploy using Heroku and Github

Finally, we will deploy using Heroku and Github.

The official website (Deploying Dash Apps) has detailed instructions on how to do Git and Heroku, so the methods are almost the same.

The process looks like this:

  1. Sign up for a Github account

  2. Create a new repository on Github

  3. SSH to Github. (Optional, but easier to do. See: Allow ssh connection to GitHub)

  4. Create the files (.ignore, Procfile, requirements.txt) required for deployment. You also need gunicorn, so install it with pip install gunicorn.

  5. Use the Git command to push the above files and the ʻapp.pyandplayers_score.pkl` data files to Github.

    git init
    git add .
    git commit -m "message"
    git remote add origin [email protected]:<username>/<Repository name>
    git push origin master
    
  6. After confirming that it has been pushed to Github, create a Heroku account and create a new app with the New> create new app button (Region does not have Japan, so select United States).

  7. Click the Deploy tab of the created app, set Deployment method to Github, and connect to the repository created in 2.

  8. Finally, press the black button Deploy Branch in Manual deploy to deploy it without permission.

Finally

How was it?

You can also use cron and Heroku's Automatic Deploy to automate updates to new data from Tenhou. (Reference: Automate the process of pushing to Github with cron)

reference

-Create an automatic grade management app for Tenhou private room with LINE bot and Python

Recommended Posts

Try to analyze online family mahjong using Python (PART 1: Take DATA)
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
Try to operate Excel using Python (Xlwings)
Try using django-import-export to add csv data to django
Try using the Python web framework Tornado Part 1
[Python3] Let's analyze data using machine learning! (Regression)
Try using the Python web framework Tornado Part 2
Let's analyze Covid-19 (Corona) data using Python [For beginners]
Summary of tools needed to analyze data in Python
Data analysis using Python 0
(Python) Try to develop a web application using Django
Data cleaning using Python
Write data to KINTONE using the Python requests module
I tried to analyze J League data with Python
Try to extract high frequency words using NLTK (python)
Try using Tweepy [Python2.7]
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
[Python] Try to create ring fit data using Amazon Textract [OCR] (Try code review with Code Guru)
[Introduction to cx_Oracle] (Part 6) DB and Python data type mapping
Try to make it using GUI and PyQt in Python
Try to operate an Excel file using Python (Pandas / XlsxWriter) ①
Try to operate an Excel file using Python (Pandas / XlsxWriter) ②
[Python] Try to analyze wav files (Ver without additional plugins)
[Python] Introduction to graph creation using coronavirus data [For beginners]
I tried to analyze scRNA-seq data using Topological Data Analysis (TDA)
[Python] Try using Tkinter's canvas
Try to analyze Twitter trends
Try to understand Python self
Post to Twitter using Python
Start to Selenium using python
Try using SQLAlchemy + MySQL (Part 1)
Try using SQLAlchemy + MySQL (Part 2)
Data analysis using python pandas
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part1-
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part2-
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part4-
[Pandas] I tried to analyze sales data with Python [For beginners]
Try to log in to Netflix automatically using python on your PC
[Python] Analyze Splatoon 2 league match data using a correlation coefficient table
How to update a Tableau packaged workbook data source using Python
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part3-
Try to solve the shortest path with Python + NetworkX + social data
Try to get CloudWatch metrics with re: dash python data source
[Python] I tried to get various information using YouTube Data API!
Introduction of "scikit-mobility", a library that allows you to easily analyze human flow data with Python (Part 1)
How to install python using anaconda
[Python] How to FFT mp3 data
Try using Pillow on iPython (Part 1)
Python Application: Data Cleansing Part 1: Python Notation
Data acquisition using python googlemap api
Python Application: Data Handling Part 3: Data Format
Try to operate Facebook with Python
Introduction to Python Hands On Part 1
Try using Pillow on iPython (Part 2)
Try using LevelDB in Python (plyvel)
Try using pynag to configure Nagios
Try to calculate Trace in Python
Try to put data in MongoDB
Try converting cloudmonkey CLI to python3 -1