I tried to display the infection condition of corona on the heat map of seaborn

Purpose

Visualize the spread of corona in each prefecture The goal is this.

Method

Use python seaborn. Click here for the number of infected people data (up to 4/5). https://toyokeizai.net/sp/visual/tko/covid19/ Click here for prefecture name data. https://gist.github.com/mugifly/d6e68a516de4a008687c Here is a summary of various things. https://github.com/kyasby/colona.git

Import library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

% matplotlib inline is magic. numpy imports for cumsum ().

Import csv number of infected people

df = pd.read_csv("COVID-19.csv")
df = df[["Consultation prefecture", "Prefecture of residence", "Number of people", "Date of onset", "Fixed date"]]
df = df.rename(columns={"Age":"age", "sex":"sex", "Consultation prefecture":"hsp", "Prefecture of residence":"house"})
df

Extract the required columns and rename them at the same time. By passing a list to df [], you can extract only that column.

df.rename(columns={"old_columns_name":"new_name"},index={"old_index_name":"new_name"}) You can change the column name and index name by passing a dictionary such as.

Hsp is hospital.

Date making

for i, row in df.iterrows():
    if type(row["Date of onset"])==float:
        df.at[i, "Date of onset"] = row["Fixed date"]
    else:
        pass
df = df.rename(columns = {"Date of onset":"Date"})

I want the horizontal axis of the heatmap to be a date, so I get the date. However, as shown below, NaN is included in the "onset date", so in that case, replace it with the "confirmed date".

hsp	house	Number of people	Date of onset	Fixed date
Kanagawa Prefecture	Kanagawa Prefecture	1	1/3/2020	1/15/2020
Tokyo	People's Republic of China	1	1/14/2020	1/24/2020
Tokyo	People's Republic of China	1	1/21/2020	1/25/2020
Osaka	Osaka	1	1/20/2020	1/29/2020
unknown	People's Republic of China	1	1/29/2020	1/30/2020
Chiba	People's Republic of China	1	NaN	1/30/2020

Finally, change the column name to "Date".

Judgment of NaN type (row ["onset date "]) == float I wrote it like this, but please let me know if there is a better way to write it.

Import CSV of prefectures

todofuken = pd.read_csv("japan.csv", header=None)[0]

Replace part of hsp

df["hsp"].value_counts()

So, if you check the contents of "hsp", you can see that there are "Haneda Airport" and "Unknown".

スクリーンショット 2020-04-06 16.04.52.png

df["hsp"]= df["hsp"].apply(lambda x : "Other" if x not in list(todofuken) else x)

Use apply and lambda function to partially rewrite the contents of df ["hsp"]. If it is not in the prefecture name list, enter "Other", and if there is, enter the prefecture as it is When using apply and lambda functions, you will probably get a syntax error without else. (Unconfirmed) Please be careful.

So far, df looks like this.

hsp	house	Number of people	Date	Fixed date
Kanagawa Prefecture	Kanagawa Prefecture	1	1/3/2020	1/15/2020
Tokyo	People's Republic of China	1	1/14/2020	1/24/2020
Tokyo	People's Republic of China	1	1/21/2020	1/25/2020
Aichi prefecture	People's Republic of China	1	1/23/2020	1/26/2020
Aichi prefecture	People's Republic of China	1	1/22/2020	1/28/2020
Nara Prefecture	Nara Prefecture	1	1/14/2020	1/28/2020
Hokkaido	People's Republic of China	1	1/26/2020	1/28/2020
Osaka	Osaka	1	1/20/2020	1/29/2020

Pivot

pvt = df.pivot_table(index="hsp", columns="Date", values="Number of people").fillna(0)
pvt = pvt.rename(index = dict(zip(jpn[0], jpn[2]))).rename(index={"Other":"others"})

pandas has a method called pivot_table that allows you to literally create a pivot table. (You don't need Excel.) Also, fill NaN with 0.

Then, rename it from Hokkaido to Hokkaido. In my environment, if the index name or column name contains Japanese, the characters will not be displayed. It seems that it will be solved by installing something, but it will be handled by renaming. (Please let me know if there is a better way.)

The contents of jpn [0] are the names of prefectures in kanji such as Hokkaido and Aomori. The contents of jpn [2] are the names of prefectures in Roman letters such as hokkaido and aomori.

Pair them with the zip function, make them into a dictionary with the dict function, and pass them to the rename function. Also, change "Other" to "others".

Up to this point, the data frame looks like this. スクリーンショット 2020-04-06 0.42.59.png

Cumulative number

for i in range(len(pvt)):
    pvt.iloc[i]=pvt.iloc[i].cumsum()

Extract line by line from pvt and use cumsum () function of numpy to make cumulative number of people.

In this way, it has been updated to the cumulative number of people. スクリーンショット 2020-04-06 0.46.14.png

View and save

plt.figure(figsize=(20,10))
sns.heatmap(pvt.iloc[:,-60:] , linewidths=0, cmap='Spectral', cbar=True, xticklabels=5)
plt.savefig("colona.png ")

I decided to display the young date from 60 days ago because there are few infected people (fortunately) and it is meaningless to display it. You can use : (slice). For example, 10:20 indicates 10 or more and less than 20.

I was able to display the heat map like this.

[PYTHON] I tried to display the infection condition of coronavirus on the heat map of seaborn