On January 16, 2020, a new type of coronavirus infection (disease name) caused by SARS-CoV-2 (virus name) was confirmed for the first time in Japan. Unfortunately, the disease has killed many people, from ordinary people to celebrities. Even now, more than half a year after that, the epidemic has not subsided, and masks are a necessity when going out. In this post, we have briefly analyzed and summarized the coronavirus in Japan. I hoped that this analysis would give me some awareness and improve my analysis skills.
In analyzing the coronavirus this time, we used the CSV data published by Jag Japan Co., Ltd.. Thank you very much. I will post the link below.
About "Map of the number of people infected with the new coronavirus"

COVID-19.ipynb
import collections
import matplotlib.pyplot as plt
import pandas as pd
COVID-19.ipynb
pd.set_option('display.max_columns', None)
df = pd.read_csv('COVID-19.csv')
df
In JupyterLab, if there are many columns, the display will be omitted, so display everything in the first line.
COVID-19.ipynb
age = df['Age'].value_counts(ascending=True)
age
Execution result
90 or more 1
90s 1
100         2
80s 7
Teen 9
70s 10
60s 12
90         14
50s 25
30s 33
40s 33
80         44
20s 49
10         66
70         69
60        128
40        167
50        179
30        203
20        310
90       1040
Unknown 1145
0-10     1335
80       2645
10       2952
70       3751
60       4531
50       7355
40       8315
30      10551
20      18009
Name:Age, dtype: int64
Since there is no single notation such as 20's and 20's ... I will try to unify the notation using df.replace ().
COVID-19.ipynb
df = df.replace({'Age':{'0-10':'under10','10's':'10','20's':'20', '30s':'30', 'Forties':'40', '50s':'50', '60s':'60', '70s':'70', '80s':'80', '90s':'90' , 'unknown':'unknown', '90 and above':'90~'}})
age2 = df['Age'].value_counts()
age2
Output result
20              18009
30              10551
40               8315
50               7355
60               4531
70               3751
10               2952
80               2645
under10     1335
unknown          1145
90               1040
20                359
30                236
50                204
40                200
60                140
70                 79
10                 75
80                 51
90                 15
100                 2
90~                 1
Name:Age, dtype: int64
I was able to suppress the output display more than before. (I tried various things because I wanted to get the total with the same numbers, but it didn't work, so I'll leave it as a future task.) It's a little hard to see, so I'll visualize it with a graph.
COVID-19.ipynb
plt.title('Age of infected person')
age2.plot.bar()
 Making it a graph makes it easier to understand visually. Looking at this graph, we can see that the younger the generation, such as those in their 20s, 30s, 40s, etc., are more infected. In particular, the large number of infected people in their 20s is obvious.
Making it a graph makes it easier to understand visually. Looking at this graph, we can see that the younger the generation, such as those in their 20s, 30s, 40s, etc., are more infected. In particular, the large number of infected people in their 20s is obvious.
COVID-19.ipynb
df = df.replace({'sex':{'male':'male', 'Female':'female', 'unknown':'unknown'}})
sex = df['sex'].value_counts()
plt.xlabel('Sex')
plt.ylabel('Number of people')
plt.title('Infected_sex')
#print(sex) #Display when you want to know the detailed number of infected people by gender
sex.plot.bar()

When I checked it in a graph, I found that the number of infected men was higher. I think that infection does not depend on the gender of humans, but I think that the purpose and behavior when going out are different, so if I can know in detail, I expect that the relationship between the number of infections by gender can be determined.
COVID-19.ipynb
fixed_date = df['Fixed date']
fixed_date = collections.Counter(fixed_date)
#fixed_date #Since there is a lot of output, the execution result is omitted.
date = []
value = []
for get_date in fixed_date:
    date.append(get_date)
for get_value in fixed_date.values():
    value.append(get_value)
plt.plot(date, value)
plt.xticks( [0, 180, 70] )
plt.xticks(rotation=45)
plt.xlabel('date')
plt.ylabel('value')
plt.title('Changes in infected people')
plt.show()
 If you check the graph, you can see that positive patients were confirmed from January, and although the number increased sharply around April and temporarily healed, it increased again in July and peaked around August. By graphing, we were able to confirm the second wave of the new coronavirus. Since the end of the graph, the number of confirmed positive patients has decreased sharply, so I'm looking forward to it in the future.
If you check the graph, you can see that positive patients were confirmed from January, and although the number increased sharply around April and temporarily healed, it increased again in July and peaked around August. By graphing, we were able to confirm the second wave of the new coronavirus. Since the end of the graph, the number of confirmed positive patients has decreased sharply, so I'm looking forward to it in the future.
I have X and Y coordinate data in CSV, so I will plot it. This time, I referred to this article.
COVID-19.ipynb
#Install it as it is required to use geopandas
pipenv install geopandas
pipenv install descartes
#Depict the original map data
map_1 = gpd.read_file('./land-master(qiita)/japan.geojson')
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);

COVID-19.ipynb
#Try entering the CSV XY coordinates
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.scatter(df['X'],df['Y'])
plt.show()
 If you look closely at the plotted points, they are meaningfully gathered in the upper right corner ... so let's expand it.
If you look closely at the plotted points, they are meaningfully gathered in the upper right corner ... so let's expand it.
COVID-19.ipynb
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.xlim([120,150]) #Set the range you want to expand(Any)
plt.ylim([30,46]) #Set the range you want to expand(Any)
plt.scatter(df['X'],df['Y'])
plt.show()
 I was able to confirm that the plot was made firmly. You can see from this map that the coronavirus is widespread nationwide. It turned out that there are many infected people in the Kyushu region as a whole, not to mention the Kanto region. It's very scary to think that there may be a risk of infection wherever you go.
I was able to confirm that the plot was made firmly. You can see from this map that the coronavirus is widespread nationwide. It turned out that there are many infected people in the Kyushu region as a whole, not to mention the Kanto region. It's very scary to think that there may be a risk of infection wherever you go.
I think there are some points that I haven't reached since this is my first post on qiita, but I am very happy that I enjoyed analyzing and creating articles. It's a simple analysis, but I'm very happy because I was able to try something new for myself by plotting the coordinates on a map. In the future, I would like to take on the challenge of deeper corona analysis. It's a difficult time with the coronavirus, but please love yourself.
Recommended Posts