[PYTHON] Visualize coronavirus infection status with Plotly [For beginners]

Next time https://qiita.com/Naoya_Study/items/851f4032fb6e2a5cd5ed

As the coronavirus infection spreads, various organizations have released cool dashboards that visualize the infection status.

Example 1 WHO Novel Coronavirus (COVID-19) Situation WHO.PNG

Example 2 Ministry of Health, Labor and Welfare New Coronavirus Infection Domestic Case korousho.PNG

Example 3 Toyo Keizai ONLINE New Coronavirus Domestic Infection Status toyokeizai.PNG

It is cool! I want to be able to make something like this myself. The ultimate goal is to use Python's visualization-specific dataframe Dash to create a dashboard like the example above. This time, as a preliminary preparation, I would like to draw using the visualization library Plotly. Please forgive the code mess.

1. Usage data

We will use the infectious disease data published by Toyo Keizai Online in Japan. https://github.com/kaz-ogiwara/covid19/

import requests
import io
import pandas as pd
import re
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime as dt
url = 'https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/individuals.csv'
res = requests.get(url).content
df = pd.read_csv(io.StringIO(res.decode('utf-8')), header=0, index_col=0)

The data is in this format.

New No. Old No. Confirmed year Confirmed month Fixed date Age sex Place of residence 1 Place of residence 2
1 1 2020 1 15 30s Man Kanagawa Prefecture
2 2 2020 1 24 Forties Man China (Wuhan City)
3 3 2020 1 25 30s woman China (Wuhan City)
4 4 2020 1 26 Forties Man China (Wuhan City)
5 5 2020 1 28 Forties Man China (Wuhan City)
6 6 2020 1 28 60s Man Nara Prefecture

As you can see, the data for people living in China is also included, but this time it will be limited to Japan, so it will be excluded.

def Get_Df():

    url = 'https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/individuals.csv'
    res = requests.get(url).content
    df = pd.read_csv(io.StringIO(res.decode('utf-8')), header=0, index_col=0)

    pattern = r'China(...)'
    df['China'] = np.nan
    for i in range (1, len(df)+1):
        if re.match(pattern, df['Place of residence 1'][i]):
            df['China'][i] = "T"
        else:
            df['China'][i] = "F"
    df = df[df["China"] != "T"].reset_index()
    
    return df
Index. New No. Old No. Confirmed year Confirmed month Fixed date Age sex Place of residence 1 Place of residence 2 China
0 1 1 2020 1 15 30s Man Kanagawa Prefecture NaN F
1 6 6 2020 1 28 60s Man Nara Prefecture NaN F
2 8 8 2020 1 29 Forties woman Osaka NaN F
3 9 10 2020 1 30 50s Man Mie Prefecture NaN F
4 11 12 2020 1 30 20's woman Kyoto NaN F

2. Cumulative number of infected people by prefecture (horizontal bar graph)

def Graph_Pref():

    df = Get_Df()
    df_count_by_place = df.groupby('Place of residence 1').count().sort_values('China')
    fig = px.bar(
        df_count_by_place,
        x="China",
        y=df_count_by_place.index,
        #By setting orientation to horizontal, it becomes a horizontal bar graph.
        orientation='h',
        width=800,
        height=1000,
        )
    fig.update_layout(
        title="Prefectures where infection has been reported",
        xaxis_title="Number of infected people",
        yaxis_title="",
     #Just specify the template and the graph will be based on black.
        template="plotly_dark",
        )
    fig.show()

aaa.png

Plotly will create interactive and fashionable diagrams on your own.

3. Draw a scatter plot on the map

Next, I would like to plot the number of infected people by prefecture on a Japanese map as a scatter plot. To do so, first obtain the latitude / longitude information of the prefectural capital of each prefecture and combine it with the csv data of Toyo Keizai Online. Prefectural office location The latitude / longitude data used was from Everyone's Knowledge A little Convenience Book. Extract only the required latitude and longitude data and merge using pandas merge.

def Df_Merge():

    df = Get_Df()
    df_count_by_place = df.groupby('Place of residence 1').count().sort_values('China')
    df_latlon = pd.read_excel("https://www.benricho.org/chimei/latlng_data.xls", header=4)
    df_latlon = df_latlon.drop(df_latlon.columns[[0,2,3,4,7]], axis=1).rename(columns={'Unnamed: 1': 'Place of residence 1'})
    df_latlon = df_latlon.head(47)
    df_merge = pd.merge(df_count_by_place, df_latlon, on='Place of residence 1')
    return df_merge
index Place of residence 1 New No. Old No. Confirmed year Confirmed month Fixed date Age sex Place of residence 2 China latitude longitude
0 Gifu Prefecture 1 1 1 1 1 1 1 0 1 35.39111 136.72222
1 Ehime Prefecture 1 1 1 1 1 1 1 0 1 33.84167 132.76611
2 Hiroshima Prefecture 1 1 1 1 1 1 1 0 1 34.39639 132.45944
3 Saga Prefecture 1 1 1 1 1 1 1 0 1 33.24944 130.29889
4 Akita 1 1 1 1 1 1 1 0 1 39.71861 140.10250
5 Yamaguchi Prefecture 1 1 1 1 1 1 1 0 1 34.18583 131.47139

Plot on the map using the above data frame.

def Graph_JapMap():
    df_merge = Df_Merge()
    df_merge['text'] = np.nan
    for i in range (len(df_merge)):
        df_merge['text'][i] = df_merge['Place of residence 1'][i] + ' : ' + str(df_merge['China'][i]) + 'Man'

    fig = go.Figure(data=go.Scattergeo(
        lat = df_merge["latitude"],
        lon = df_merge["longitude"],
        mode = 'markers',
        marker = dict(
                color = 'red',
                size = df_merge['China']/5+6,
                opacity = 0.8,
                reversescale = True,
                autocolorscale = False
                ),
        hovertext = df_merge['text'],
        hoverinfo="text",
    ))
    fig.update_layout(
        width=700,
        height=500,
        template="plotly_dark",
        title={
            'text': "Infected person distribution",
            'font':{
                'size':25
            },
            'y':0.9,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        margin = {
            'b':3,
            'l':3,
            'r':3,
            't':3
            },
        geo = dict(
            resolution = 50,
            landcolor = 'rgb(204, 204, 204)',
            coastlinewidth = 1,
            lataxis = dict(
                range = [28, 47],
            ),
            lonaxis = dict(
                range = [125, 150],
            ),
        )
    )
    fig.show()

map.png

This is an image, but if you do it online, hover over the plot to see the specific number of infected people and it's cool. Please, try it.

4. Changes in the number of infected people (stacked bar graph)

Next is a bar graph of changes in the number of infected people. As before, first transform the data with pandas.

def Df_Count_by_Date():
    
    df = Get_Df()
    df['date'] = np.nan
    for i in range (len(df)):
        tstr = "2020-" + str(df['Confirmed month'][i]) + "-" + str(df['Fixed date'][i])
        tdatetime = dt.strptime(tstr, '%Y-%m-%d')
        df['date'][i] = tdatetime

    df_count_by_date = df.groupby("date").count()

    df_count_by_date["total"] = np.nan
    df_count_by_date['gap'] = np.nan
    df_count_by_date["total"][0] = df_count_by_date["China"][0]
    df_count_by_date["gap"][0] = 0

    for i in range (1, len(df_count_by_date)):
        df_count_by_date["total"][i] = df_count_by_date['total'][i-1] + df_count_by_date['China'][i]
        df_count_by_date['gap'][i] = df_count_by_date['total'][i] - df_count_by_date['China'][i]
    df_count_by_date['total'] = df_count_by_date['total'].astype('int')
    df_count_by_date['gap'] = df_count_by_date['gap'].astype('int')

    return df_count_by_date
def Graph_total():

    df_count_by_date = Df_Count_by_Date()

    fig = go.Figure(data=[
        go.Bar(
            name='Cumulative number up to the previous day',
            x=df_count_by_date.index,
            y=df_count_by_date['gap'],
            ),
        go.Bar(
            name='New number',
            x=df_count_by_date.index,
            y=df_count_by_date['China']
            )
    ])
    # Change the bar mode
    fig.update_layout(
        barmode='stack',
        template="plotly_dark",
        title={
            'text': "Changes in the number of patients",
            'font':{
                'size':25
                },
            'y':0.9,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'
            },
        xaxis_title="Date",
        yaxis_title="Number of infected people",
        )
    fig.show()

total.png

5. Plot on world map

Plotly's scattergeo recognizes the country with a 3-digit ISO code, so borrow the country code from the net and merge it with pandas.

INDEX COUNTRY Confirmed Deaths ISO CODES code size
0 China 81049 3230 CN / CHN CHN 82049.0
1 Italy 27980 2158 IT / ITA ITA 28980.0
2 Iran 14991 853 IR / IRN IRN 15991.0
3 South Korea 8236 75 KR / KOR KOR 9236.0
4 Spain 7948 342 ES / ESP ESP 8948.0
fig = px.scatter_geo(
        df_globe_merge,
        locations="code",
        color='Deaths',
        hover_name="COUNTRY",
        size="size",
        projection="natural earth"
        )
fig.update_layout(
        width=700,
        height=500,
        template="plotly_dark",
        title={
            'text': "Infected person distribution",
            'font':{
                'size':25
            },
            'y':0.9,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        geo = dict(
            resolution = 50,
            landcolor = 'rgb(204, 204, 204)',
            coastlinewidth = 1,
            ),
        margin = {
            'b':3,
            'l':3,
            'r':3,
            't':3
        })
fig.show()

world.png

You can also fill it.

fig = px.choropleth(
    df_globe_merge,
    locations="code",
    color='Confirmed',
    hover_name="COUNTRY",
    color_continuous_scale=px.colors.sequential.GnBu
    )
fig.update_layout(
        width=700,
        height=500,
        template="plotly_dark",
        title={
            'text': "Infected person distribution",
            'font':{
                'size':25
            },
            'y':0.9,
            'x':0.5,
            'xanchor': 'center',
            'yanchor': 'top'},
        geo = dict(
            resolution = 50,
            landcolor = 'rgb(204, 204, 204)',
            coastlinewidth = 0.1,
            ),
        margin = {
            'b':3,
            'l':3,
            'r':3,
            't':3
        }
    )
fig.show()

map2.png

The color scale is It changes with GnBU of color_continuous_scale = px.colors.sequential.GnBu. Color list https://plot.ly/python/builtin-colorscales/

I was rewriting for Dash, but visualization with plotly.express didn't work, so I also made a drawing using plotly.graph_object.

fig = go.Figure(
    data=go.Choropleth(
        locations = df_globe_merge['code'],
        z = df_globe_merge['Confirmed'],
        text = df_globe_merge['COUNTRY'],
        colorscale = 'Plasma',
        marker_line_color='darkgray',
        marker_line_width=0.5,
        colorbar_title = 'Number of infected people',
    )
)
fig.update_layout(
    template="plotly_dark",
    width=700,
    height=500,
    title={
        'text': "Infected person distribution",
        'font':{
             'size':25
            },
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    geo=dict(
        projection_type='equirectangular'
    )
)

fig.show()

map3.png

It looks almost the same except that the color scale is changed from GnBu to Plasma.

When data transformation and visualization are ready, I would like to reflect these in Dash (next time)

Recommended Posts

Visualize coronavirus infection status with Plotly [For beginners]
Create an animated time series map of coronavirus infection status with python + plotly
INSERT into MySQL with Python [For beginners]
[Python] Read images with OpenCV (for beginners)
WebApi creation with Python (CRUD creation) For beginners
[For beginners] Try web scraping with Python
Pandas basics for beginners ③ Histogram creation with matplotlib
Causal reasoning and causal search with Python (for beginners)
~ Tips for Python beginners from Pythonista with love ① ~
~ Tips for Python beginners from Pythonista with love ② ~
[Introduction for beginners] Working with MySQL in Python
Roadmap for beginners
Visualize Prophet's time series forecasts more clearly with Plotly
[For beginners] Script within 10 lines (8. Plot map with folium [2]
Visualize the appreciation status of art works with OpenCV
[For beginners] Quantify the similarity of sentences with TF-IDF
Seaborn basics for beginners ③ Scatter plot (jointplot) * With histogram