[PYTHON] Try using COVID-19's open data from Yokohama / Tokyo / Osaka

background

I am a salaryman living in Yokohama. It's been two years since I started doing Python. I passed the 2019 # 3 G test and the first AI implementer's test, There is no such thing as "what you can do". When I watch TV reports about COVID-19, I feel very scared, but sometimes Is it fine? I thought, or to put it simply, it made me feel like I was playing on TV. Can't you figure out the situation yourself? I thought that was the trigger.

I narrowed down the scope

That is the reason.

I also saw Kaggle's CORD-19, but ... English ...

what to do

I searched Google for [Tokyo Open Data Corona] and found it.

Government CIO Portal

From here, refer to the data and try graphing first.

1. Create a link csv for open data URL

elements url
tokyo_consultation https://stopcovid19.metro.tokyo.lg.jp/data/130001_tokyo_covid19_combined_telephone_advice_center.csv
tokyo_call https://stopcovid19.metro.tokyo.lg.jp/data/130001_tokyo_covid19_call_center.csv
tokyo_confirm https://stopcovid19.metro.tokyo.lg.jp/data/130001_tokyo_covid19_patients.csv
kanagawa_confirm http://www.pref.kanagawa.jp/osirase/1369/data/csv/patient.csv
kanagawa_consultation http://www.pref.kanagawa.jp/osirase/1369/data/csv/contacts.csv
kanagawa_returnee http://www.pref.kanagawa.jp/osirase/1369/data/csv/querent.csv
osaka_confirm http://www.pref.osaka.lg.jp/attach/23711/00346644/youseisyajyouhou.xlsx

2. Graph a simple cumulative number from Index and date data

Library import

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime as dt
#↓ I always put
plt.close('all')

Dataframe conversion and processing of data

Since the format is different in each prefecture, each process


'''
##########################################
0.Tokyo Metropolitan Government New Corona Consultation Service Number of consultations [new]
1.Number of consultations at the New Corona Call Center in Tokyo [New]
2.Tokyo_Details of new coronavirus positive patient announcement
3.Kanagawa Prefecture_Number of positive patients and attributes
4.Kanagawa Prefecture_Number of dedicated dial consultations
5.Kanagawa Prefecture_Returnee / Contact Telephone Counseling Center Number of consultations
6.Osaka_Status of positive patient development
##########################################
'''
#List the numbers to be displayed in this list
#Corresponds to the number of patients 2,3,Put 6 in the list
sel_list = [2, 3, 6]

for sel_num in sel_list:
    #Load the csv containing each URL
    url_df = pd.read_csv('opendata.csv')
    if sel_num < 3: #For Tokyo data
        df = pd.read_csv(url_df.loc[sel_num,'url'], encoding='utf-8')
    elif sel_num == 6: #For Osaka data
        df = pd.read_excel(url_df.loc[sel_num,'url'], encoding='shift_jis',header=1)
    else: #For Kanagawa data
        df = pd.read_csv(url_df.loc[sel_num, 'url'], encoding='shift-jis')

    #For Tokyo data
    if sel_num == 2:
        fig1 = plt.figure()
        ax1 = fig1.add_subplot()
        #Since it cannot be displayed in Japanese, change to alphabetical notation
        df = df.rename(columns={'Published_date': 'date', 'No':'Tokyo'})
        df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
        df = df.set_index('date')
        df['Tokyo'].plot(ax=ax1, legend=True)
    
    #For Kanagawa data
    elif sel_num == 3:
        df = df.rename(columns={'Announcement date': 'date'})
        df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
        df = df.set_index('date')
        #Since there is no index number, I will make it
        serial_num = pd.RangeIndex(start=1, stop=len(df.index) + 1, step=1)
        #Change column name to Kanagawa for legend name
        df['Kanagawa'] = serial_num
        df['Kanagawa'].plot(ax=ax1, legend=True)
    
    #For Osaka data
    elif sel_num == 6:
        df = df.rename(columns={'number':'num', 'Press date': 'date'})
        #Convert Excel date data
        excel_time = datetime.date(1899, 12, 30)
        ##Process each line with iteration
        #DataFrame replacement creates a list and overwrites
        new_date = []
        for index, row in df.iterrows():
            new_date.append(excel_time + datetime.timedelta(row['date']))        
        df['date'] = new_date
        
        df['Osaka'] = df['num']
        df = df.set_index('date')
        df['Osaka'].plot(ax=ax1, legend=True) 

3. Where I was worried-Excel time notation-

I was wondering if it was UNIX time notation, but I used Excel-specific date numbers. I found this page and found that it was being read.

Convert "numerical value" of Excel date to datetime of python

I couldn't help myself.

4. Result

Figure_1.png

5. Impressions

I was able to get a cumulative graph. The code is not smart. For the time being, that's all for today. Python has many sites that tell you exactly what you want to do, so There is a sense of security that you will be able to do something when you are in trouble. However, I didn't know the datetime notation in Excel.

Recommended Posts

Try using COVID-19's open data from Yokohama / Tokyo / Osaka
Tokyo Corona: Try to make a simple prediction from open data with the exponential function curve_fit
Get data from Twitter using Tweepy
Try using Amazon DynamoDB from Python
Try using scanpy's data integration function (sc.tl.ingest)
Try using django-import-export to add csv data to django
Send data from Raspberry Pi using AWS IOT
Encrypt / decrypt data from golang using Azure KeyVault