[PYTHON] I tried to display the infection condition of coronavirus on the heat map of seaborn

I tried to display the infection condition of corona on the heat map of seaborn

Purpose

Visualize the spread of corona in each prefecture The goal is this. colona.png

Method

Use python seaborn. Click here for the number of infected people data (up to 4/5). https://toyokeizai.net/sp/visual/tko/covid19/ Click here for prefecture name data. https://gist.github.com/mugifly/d6e68a516de4a008687c Here is a summary of various things. https://github.com/kyasby/colona.git

Import library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

% matplotlib inline is magic. numpy imports for cumsum ().

Import csv number of infected people

df = pd.read_csv("COVID-19.csv")
df = df[["Consultation prefecture", "Prefecture of residence", "Number of people", "Date of onset", "Fixed date"]]
df = df.rename(columns={"Age":"age", "sex":"sex", "Consultation prefecture":"hsp", "Prefecture of residence":"house"})
df

Extract the required columns and rename them at the same time. By passing a list to df [], you can extract only that column.

df.rename(columns={"old_columns_name":"new_name"},index={"old_index_name":"new_name"}) You can change the column name and index name by passing a dictionary such as.

Date making

for i, row in df.iterrows():
    if type(row["Date of onset"])==float:
        df.at[i, "Date of onset"] = row["Fixed date"]
    else:
        pass
df = df.rename(columns = {"Date of onset":"Date"})

I want the horizontal axis of the heatmap to be a date, so I get the date. However, as shown below, NaN is included in the "onset date", so in that case, replace it with the "confirmed date".

hsp house Number of people Date of onset Fixed date
Kanagawa Prefecture Kanagawa Prefecture 1 1/3/2020 1/15/2020
Tokyo People's Republic of China 1 1/14/2020 1/24/2020
Tokyo People's Republic of China 1 1/21/2020 1/25/2020
Osaka Osaka 1 1/20/2020 1/29/2020
unknown People's Republic of China 1 1/29/2020 1/30/2020
Chiba People's Republic of China 1 NaN 1/30/2020

Finally, change the column name to "Date".

Judgment of NaN type (row ["onset date "]) == float I wrote it like this, but please let me know if there is a better way to write it.

Import CSV of prefectures

todofuken = pd.read_csv("japan.csv", header=None)[0]

Replace part of hsp

df["hsp"].value_counts()

So, if you check the contents of "hsp", you can see that there are "Haneda Airport" and "Unknown".

スクリーンショット 2020-04-06 16.04.52.png

df["hsp"]= df["hsp"].apply(lambda x : "Other" if x not in list(todofuken) else x)

Use apply and lambda function to partially rewrite the contents of df ["hsp"]. If it is not in the prefecture name list, enter "Other", and if there is, enter the prefecture as it is When using apply and lambda functions, you will probably get a syntax error without else. (Unconfirmed) Please be careful.

So far, df looks like this.

hsp house Number of people Date Fixed date
Kanagawa Prefecture Kanagawa Prefecture 1 1/3/2020 1/15/2020
Tokyo People's Republic of China 1 1/14/2020 1/24/2020
Tokyo People's Republic of China 1 1/21/2020 1/25/2020
Aichi prefecture People's Republic of China 1 1/23/2020 1/26/2020
Aichi prefecture People's Republic of China 1 1/22/2020 1/28/2020
Nara Prefecture Nara Prefecture 1 1/14/2020 1/28/2020
Hokkaido People's Republic of China 1 1/26/2020 1/28/2020
Osaka Osaka 1 1/20/2020 1/29/2020

Pivot

pvt = df.pivot_table(index="hsp", columns="Date", values="Number of people").fillna(0)
pvt = pvt.rename(index = dict(zip(jpn[0], jpn[2]))).rename(index={"Other":"others"})

pandas has a method called pivot_table that allows you to literally create a pivot table. (You don't need Excel.) Also, fill NaN with 0.

Then, rename it from Hokkaido to Hokkaido. In my environment, if the index name or column name contains Japanese, the characters will not be displayed. It seems that it will be solved by installing something, but it will be handled by renaming. (Please let me know if there is a better way.)

The contents of jpn [0] are the names of prefectures in kanji such as Hokkaido and Aomori. The contents of jpn [2] are the names of prefectures in Roman letters such as hokkaido and aomori.

Pair them with the zip function, make them into a dictionary with the dict function, and pass them to the rename function. Also, change "Other" to "others".

Up to this point, the data frame looks like this. スクリーンショット 2020-04-06 0.42.59.png

Cumulative number

for i in range(len(pvt)):
    pvt.iloc[i]=pvt.iloc[i].cumsum()

Extract line by line from pvt and use cumsum () function of numpy to make cumulative number of people.

In this way, it has been updated to the cumulative number of people. スクリーンショット 2020-04-06 0.46.14.png

View and save

plt.figure(figsize=(20,10))
sns.heatmap(pvt.iloc[:,-60:] , linewidths=0, cmap='Spectral', cbar=True, xticklabels=5)
plt.savefig("colona.png ")

I decided to display the young date from 60 days ago because there are few infected people (fortunately) and it is meaningless to display it. You can use : (slice). For example, 10:20 indicates 10 or more and less than 20.

I was able to display the heat map like this.

colona.png

Recommended Posts

I tried to display the infection condition of coronavirus on the heat map of seaborn
I tried to visualize the common condition of VTuber channel viewers
I tried using PDF data of online medical care based on the spread of the new coronavirus infection
I tried to display the altitude value of DTM in a graph
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried cluster analysis of the weather map
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to sort out the objects from the image of the steak set meal --③ Similar image Heat map detection
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to notify the honeypot report on LINE
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
[Python] I tried to visualize the follow relationship of Twitter
[Machine learning] I tried to summarize the theory of Adaboost
I tried to fight the Local Minimum of Goldstein-Price Function
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to display the point cloud data DB of Shizuoka prefecture with Vue + Leaflet
I want to output a beautifully customized heat map of the correlation matrix. matplotlib edition
[Shell startup] I tried to display the shell on the TV with a cheap Linux board G-cluster
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried to move the ball
I tried to create an environment of MkDocs on Amazon Linux
[Linux] I tried to summarize the command of resource confirmation system
I tried to get the index of the list using the enumerate function
I tried to automate the watering of the planter with Raspberry Pi
[Python] I wrote the route of the typhoon on the map using folium
Display the image of the camera connected to the personal computer on the GUI.
I tried to build the SD boot image of LicheePi Nano
I tried to digitize the stamp stamped on paper using OpenCV
I tried to display the video playback time (OpenCV: Python version)
I tried to get started with Bitcoin Systre on the weekend
I tried to display GUI on Mac with X Window System
I tried to expand the size of the logical volume with LVM
I tried to summarize the frequently used implementation method of pytest-mock
I tried to improve the efficiency of daily work with Python
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner
I tried the asynchronous server of Django 3.0
I tried to summarize the umask command
I tried to get the batting results of Hachinai using image processing
I tried to visualize the age group and rate distribution of Atcoder
I tried to recognize the wake word
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried how to improve the accuracy of my own Neural Network
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
I tried to get the authentication code of Qiita API with Python.
I tried to summarize the graphical modeling.
Matching karaoke keys ~ I tried to put it on Laravel ~ <on the way>
Display the graph of tensorBoard on jupyter