[PYTHON] Let's take a look at the infection tendency of the new coronavirus COVID-19 in each country and the medical response status (additional information).

Updated (May 20, 2020)

If the number of infected people can be reduced to a traceable amount by requesting refraining from going out, then by tracing infected people and close contacts, "with the new coronavirus (with COVID-19)" until the vaccine is developed. At that time, I was hoping to get back to the name "everyday" little by little, but when I met with a Taiwanese person yesterday, he said that he was returning to everyday life. There is a lot of learning as a success story. スクリーンショット 2020-05-20 9.48.34.png

Tsai Ing-wen @iingwen

1️⃣: Donating 10 million face masks to countries in need. 2️⃣: Increasing production of quinine. 3️⃣: Sharing our use of technology to trace & investigate outbreaks.

The situation in the United States is like this. スクリーンショット 2020-05-20 9.48.43.png

It seems that the peak has been exceeded, but the transition of newly infected people is a "gradual decrease", and it seems that it will take some time.

Updated (March 26, 2020)

The code Github has been updated because the data file name has changed.

Updated (May 20, 2020)

Infection situation in Japan

スクリーンショット 2020-05-20 9.47.09.png

スクリーンショット 2020-05-20 9.46.58.png

Introduction

The new coronavirus COVID-19 has spread all over the world. The wonder is announced daily and hourly in the news. However, most of the reports are centered on the infection situation in Japan, and when it comes to the infection situation in the world, I feel that it is only in some countries where the number of infected people is increasing significantly. However, looking at the dire situation in such a world, it seems that Japan is still relieved, and that the infection is stagnant or has converged. However, I tried to find out what the actual situation is, based on the publicly available information, about the infection situation in each country including Japan. The code I created is on Github, so please download it if you like.

Infection status of new coronavirus

The New York Times' Which Country Has Flattened the Curve for the Coronavirus? (Reported March 19, 2020) was the catalyst for this consideration. 03/19/world/coronavirus-flatten-the-curve-countries.html?algo=top_conversion&fellback=false&imp_id=269168688&imp_id=822987366&action=click&module=Most%20Popular&pgtype=Homepage). Some are introduced here.

スクリーンショット 2020-03-23 22.34.13.png Trends in Newly Infected People in China and South Korea (© The New York Times)

Looking at the trend of the 7-day moving average of the number of newly infected people in China and South Korea, the measures of each country such as movement restrictions have produced good results, and the number of newly infected people has decreased significantly in both countries. .. On the other hand, Singapore, Hong Kong and Taiwan in the same Asia showed a declining trend in the middle of February due to the response, but gradually increased from the middle of March.

スクリーンショット 2020-03-23 22.34.24.png Trends in new infections in Singapore, Hong Kong and Taiwan (© The New York Times)

In Italy, where serious damage has been reported, the number of infected people has increased significantly.

スクリーンショット 2020-03-23 22.34.36.png Trends in the number of new infections in Italy (© The New York Times)

Here, it is created using Data published by Johns Hopkins University. This is an article on March 19th, how has the current situation changed due to the measures taken by each country, and is it improving or worsening? We will seek similar results in order to correctly grasp and recognize the current situation based on the data released daily.

Understand the current infection status

The daily number of newly infected people in each country using "time_series_covid19_confirmed_global.csv", which is updated daily by Data published by Johns Hopkins University, as input data. , And its 7-day moving average trend. Here, the number of newly infected people is used instead of the cumulative number of infected people. By visualizing daily changes, how the situation is changing due to the measures taken by each country, and how the trends of each country in the world are. I think it makes sense to see it at a glance.

Google Colaboratory is used as the code execution environment. If you have a Google account, please upload the file to the drive and run it yourself. The degree of interest will increase greatly for each person.

Here, we will introduce typical chords and their outputs. Please check the entire code on Github.

First, execute the following code to download the data. The data is updated daily and will be updated at 9:00 am Japan time (midnight UT time). Please check Johns Hopkins University @ github for the data source. For example, if there is an error in the reported data in Japan, feedback is given such as suggesting with a pull request.

#CODIV with git clone-Download 19 data for use.
!git clone https://github.com/CSSEGISandData/COVID-19.git

After that, check the data by executing the following.

path = '/content/COVID-19/csse_covid_19_data/csse_covid_19_time_series/'
df = pd.read_csv(path + 'time_series_covid19_confirmed_global.csv')

Now, check the registered country / region information.

country = df['Country/Region'].unique()
print(country)

print('Number of country/region: ' + str(len(country)))

The execution result is as follows. There is information on 170 countries / regions in total.

['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola'
 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'
 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'
 'Belgium' 'Benin' 'Bhutan' 'Bolivia' 'Bosnia and Herzegovina' 'Brazil'
 'Brunei' 'Bulgaria' 'Burkina Faso' 'Cabo Verde' 'Cambodia' 'Cameroon'
 'Canada' 'Central African Republic' 'Chad' 'Chile' 'China' 'Colombia'
 'Congo (Brazzaville)' 'Congo (Kinshasa)' 'Costa Rica' "Cote d'Ivoire"
 'Croatia' 'Cruise Ship' 'Cuba' 'Cyprus' 'Czechia' 'Denmark' 'Djibouti'
 'Dominican Republic' 'Ecuador' 'Egypt' 'El Salvador' 'Equatorial Guinea'
 'Eritrea' 'Estonia' 'Eswatini' 'Ethiopia' 'Fiji' 'Finland' 'France'
 'Gabon' 'Gambia' 'Georgia' 'Germany' 'Ghana' 'Greece' 'Guatemala'
 'Guinea' 'Guyana' 'Haiti' 'Holy See' 'Honduras' 'Hungary' 'Iceland'
 'India' 'Indonesia' 'Iran' 'Iraq' 'Ireland' 'Israel' 'Italy' 'Jamaica'
 'Japan' 'Jordan' 'Kazakhstan' 'Kenya' 'Korea, South' 'Kuwait'
 'Kyrgyzstan' 'Latvia' 'Lebanon' 'Liberia' 'Liechtenstein' 'Lithuania'
 'Luxembourg' 'Madagascar' 'Malaysia' 'Maldives' 'Malta' 'Mauritania'
 'Mauritius' 'Mexico' 'Moldova' 'Monaco' 'Mongolia' 'Montenegro' 'Morocco'
 'Namibia' 'Nepal' 'Netherlands' 'New Zealand' 'Nicaragua' 'Niger'
 'Nigeria' 'North Macedonia' 'Norway' 'Oman' 'Pakistan' 'Panama'
 'Papua New Guinea' 'Paraguay' 'Peru' 'Philippines' 'Poland' 'Portugal'
 'Qatar' 'Romania' 'Russia' 'Rwanda' 'Saint Lucia'
 'Saint Vincent and the Grenadines' 'San Marino' 'Saudi Arabia' 'Senegal'
 'Serbia' 'Seychelles' 'Singapore' 'Slovakia' 'Slovenia' 'Somalia'
 'South Africa' 'Spain' 'Sri Lanka' 'Sudan' 'Suriname' 'Sweden'
 'Switzerland' 'Taiwan*' 'Tanzania' 'Thailand' 'Togo'
 'Trinidad and Tobago' 'Tunisia' 'Turkey' 'Uganda' 'Ukraine'
 'United Arab Emirates' 'United Kingdom' 'Uruguay' 'US' 'Uzbekistan'
 'Venezuela' 'Vietnam' 'Zambia' 'Zimbabwe' 'Dominica' 'Grenada'
 'Mozambique' 'Syria' 'Timor-Leste' 'Belize' 'Laos' 'Libya']
Number of country/region: 170

Some of China and the United States are classified at the state level, but the following code is executed to convert them into national level data. Also, since latitude / longitude information is unnecessary here, it will be deleted.

df1 = df.groupby('Country/Region', as_index=False).sum()

And since the columns of the current data are dates, the rows and columns are converted and graphed. The trend graph of the target country is obtained by executing the following.

df1 = df.groupby('Country/Region', as_index=False).sum()

Next, since the downloaded data is the total number of infected people, it is converted to the number of newly infected people by taking the difference for each day.

df2 = df1.diff(1)

Then, execute the following to calculate the moving average for 7 days.

#Calculate the average value for the past seven days.

for i in range(len(df2.columns)):
    df2[df2.columns[i]+'_7-dayAverage'] =df2[df2.columns[i]].rolling(7).mean().round(1)

Now, let's take Japan as an example to find the trend of the number of newly infected people.

#Visualize the infection tendency in Japan.
import matplotlib.ticker as ticker

#Get column number
id_japan = df2.columns.get_loc('Japan') #If you change to the target country, you can find the trend of each country.

#Graphing
fig, ax  = plt.subplots(figsize=(5, 5))

ax.bar(x = df2.index, height = df2[str(df2.columns[id_japan])], color = 'mistyrose', label = "New cases")
ax.plot(df2.index, df2[str(df2.columns[id_japan + len(country)])], color = 'red',label = "7-day average")
ax.set_xlabel("Date")
ax.set_ylabel("Confirmed cases")
plt.rcParams["font.size"] = 10
ax.xaxis.set_major_locator(ticker.MultipleLocator(30.00))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
plt.title(str(df2.columns[id_japan]),fontweight="bold")
plt.legend(bbox_to_anchor=(0, 1), loc='upper left', borderaxespad=1, fontsize=10)
plt.show()

スクリーンショット 2020-03-25 10.31.35.png

As you can see from this, it was on an increasing trend in February, but it can be seen that the rate of increase is increasing in March. According to the Report of the Ministry of Health, Labor and Welfare, the incubation period of the new coronavirus is 1-12.5 days (mostly 5- Since it is said to be on the 6th), according to the Government announcement on February 27, "Here 1, 2 Measures have been taken to limit travel, such as temporary school closures, as "weeks are extremely important", but there is still an increase. Next, let's look at the situation in South Korea.

スクリーンショット 2020-03-25 10.38.41.png

In February, there were reports of many infected people in the Daegu area of Gyeongsangbuk-do, South Korea, but it can be seen that the number of newly infected people has decreased significantly due to measures such as movement restrictions after that. .. I think the measures taken in South Korea and their results will be very helpful. For details on measures taken in South Korea, refer to this article. I will.

Next, let's take a look at Italy, where the spread of the new coronavirus is reported every day.

スクリーンショット 2020-03-25 10.44.32.png

Major movement restrictions have been taken throughout the country It has been about two weeks since March 10, but considering the incubation period of the new coronavirus. Unfortunately, it is still on the rise. However, I would like to expect that the rate of increase seems to be decreasing a little.

Infection status in each country (as of March 25, 2020)

So far, we have looked at the trends of newly infected people in some typical countries, but finally we will graph the trends in each country in the world.

#Graphing infection trends around the world

fig, ax  = plt.subplots(dpi=100, figsize=(60, 120))
plt.subplots_adjust(wspace=0.4, hspace=0.6)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['left'].set_visible(False)
plt.gca().spines['bottom'].set_visible(False)
plt.tick_params(labelbottom=False)
plt.tick_params(bottom=False)

for i in  range(len(country)):
    ax = fig.add_subplot(20, 10, i+1)

    ax.bar(x = df2.index, height = df2[str(df2.columns[i])], color = 'mistyrose', label = "New cases")
    ax.plot(df2.index, df2[str(df2.columns[i + len(country)])], color = 'red',label = "7-day average")

    ax.set_xlabel("Date")
    ax.set_ylabel("Confirmed cases")
    plt.rcParams["font.size"] = 10
    ax.xaxis.set_major_locator(ticker.MultipleLocator(30.00))
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
    plt.title(str(df2.columns[i]),fontweight="bold")
    plt.legend(bbox_to_anchor=(0, 1), loc='upper left', borderaxespad=1, fontsize = 7)
 
# general title
plt.suptitle("Where Countries Are on the Curve", fontsize=13, fontweight=0, color='black', style='italic', y=1.02)

dt_today = datetime.date.today()
plt.savefig(str(dt_today) + "_COVID-19_timeseries.png ") #If it is only displayed, disable or delete it.
plt.savefig(str(dt_today) + "_COVID-19_timeseries.jpg ") #If it is only displayed, disable or delete it.

The resulting image is a bit large, but it happens below (as of March 25, 2020).

2020-03-25_COVID-19_timeseries.png

If you execute the code, you can see all the information of each country, so please try it for yourself. From here, I feel that the infection of the new coronavirus is on the rise worldwide and that it is moving to a new phase.

The code and results are on github, so please download and use it if you like.

in conclusion

Impressed by the article of The New York Times, I tried to visualize the trend of the number of newly infected people in each country. When I try to handle the data myself, this situation becomes personalized and interest is increasing. These days, I run the code every morning, check the situation as "for myself", think about the condition, and watch the news.

Please understand that the discussions in this article have been made by an amateur individual and do not guarantee anything.

We hope that the information and code shared here can contribute to the response to the infection of the new coronavirus, and we would like to thank all the people who are working daily to reduce the infection of the new coronavirus.

We would appreciate it if you could give us your suggestions and opinions. I hope we can have a discussion.

Article update (March 25, 2020)

Data also published by Johns Hopkins University also has data on the number of recovered (recovered) and the number of deaths (deaths), so use that. From the number of infected people and their relationship, let us consider the medical systems and their response status in each country.

Do the following and enter information including the number of recoverers and deaths.

path = '/content/COVID-19/csse_covid_19_data/csse_covid_19_time_series/'

df_c = pd.read_csv(path + 'time_series_19-covid-Confirmed.csv') #Number of infected people
df_r = pd.read_csv(path + 'time_series_19-covid-Recovered.csv') #Number of recoverers
df_d = pd.read_csv(path + 'time_series_19-covid-Deaths.csv') #Number of deaths

Since the data format is the same as the number of infected people, the same processing will be performed.

Since the infected people recovered after a few days, we calculated the number of days until recovery from the tendency of the number of infected people and the number of recovered people.

td_d = datetime.timedelta(days=16) #The number of recovery days is assumed to be 16 days.
df2_r.index = df2_r.index - td_d #Shift the date of the number of recoverers to 16 days ago.
df2_r

Now that we have the data, we can graph the trends in the number of infected people, the number of recoverers, the number of deaths, and the number of recoverers who have shifted the assumed recovery date. All of these data are integrated.

#Visualize the infection tendency of each country.
import matplotlib.ticker as ticker

#Get column number
id_x = df.columns.get_loc('Singapore') #We seek Singapore as a representative. Browse the situation of each country by changing to the country of interest.

#Graphing
fig, ax1  = plt.subplots(figsize=(8, 8))


ax1.plot(df.index, df[str(df.columns[id_x -2*len(country)])], color = 'red',label = "Confirmed Cases")
ax1.plot(df.index, df[str(df.columns[id_x - len(country)])], color = 'green',label = "Recovered")
ax1.plot(df.index, df[str(df.columns[id_x ])], color = 'yellow',label = "Deaths")
ax1.plot(df2_r.index, df2_r[str(df2_r.columns[id_x -2*len(country)])], color = 'blue',label = "Shifterd Recovered")

ax1.set_xlabel("Date")
ax1.set_ylabel("Confirmed Cases - Recovered")
plt.rcParams["font.size"] = 10
ax1.xaxis.set_major_locator(ticker.MultipleLocator(30.00))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
plt.title(str(df.columns[id_x]),fontweight="bold")

ax1.legend(loc='upper left', borderaxespad=1, fontsize = 10)
plt.show()

Let's take a look at Singapore, where the initial countermeasures seemed to work well due to the tendency of newly infected people.

スクリーンショット 2020-03-25 23.57.59.png

From this, the trend of the number of recoverers (blue) shifted 16 days ago almost overlaps with the trend of the number of infected people (red), and the number of days to recovery seems to be correct in 16 days. On the other hand, the number of deaths (yellow) has maintained a value close to 0 since mid-March, when the number of infected people is rapidly increasing, and it seems that the medical environment is stable at present. I will. However, I am worried that many infected people will recover from the rapidly increasing number of infected people.

Next, let's look at the case of Japan.

スクリーンショット 2020-03-26 0.03.11.png

Until mid-February, the number of infected people and the number of shifted recoverers were almost the same, and many infected people were recovering during this period. On the other hand, since then, the number of recovered people who have shifted has become more divergent from the number of infected people, and the number of deaths has also tended to increase slightly. It would be good if the infected person simply needed 16 days or more to recover, but if this divergence is a factor in the medical environment, there is great concern in the future. Earlier, the Governor of Tokyo held a press conference, showing that he is currently in a "critical phase." It may be a judgment to prevent further dissociation.

Next, let's take a look at Italy, where the collapse of medical care is reported in the news.

スクリーンショット 2020-03-26 0.08.00.png

It seems that Italy had the capacity to recover the number of infected people by the end of February even with the current medical equipment. However, compared to the previous two countries, the rate of increase in the number of infected people has been high since the beginning of March, and the increase in the number of deaths is also high. This alone does not tell us the current situation, but this tendency suggests that the medical environment is quite severe.

Next, let's take a look at San Marino, a small country in the mountainous region of Italy.

スクリーンショット 2020-03-26 0.13.50.png

Since it is a country within Italy, the number of infected people has increased sharply since the beginning of March, just like Italy. In addition, the number of deaths exceeds the number of recoverers, and it is inferred that a sufficient medical system is not established locally.

Finally, the results of the same analysis processing on the data of each country in the world are shown. This time, the number of recovery days was fixed at 16 days, but due to differences in the medical environment and systems of each country, it may be better to set a value suitable for each. Also, in the United States, the number of infected people has increased sharply since the middle of March, and it is difficult to guess from this because there is little information.

I would like to consider whether the measures in each country are functioning from the two perspectives of the number of newly infected people and the relationship between the number of people recovering and the number of deaths in the future.

2020-03-25_COVID-19_timeseriesA.png

This code is also uploaded to Github, so please download it if you like.

Reference article

[Which Country Has Flattened the Curve for the Coronavirus? (Reported March 19, 2020)](https://www.nytimes.com/interactive/2020/03/19/world/coronavirus-flatten-the-curve- countries.html? algo = top_conversion & fellback = false & imp_id = 269168688 & imp_id = 822987366 & action = click & module = Most% 20Popular & pgtype = Homepage) 2019-nCoV Global Cases ( by Johns Hopkins CSSE) Visualization(Dash Board) 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE Q & A about the new coronavirus (for the general public) @ Ministry of Health, Labor and Welfare

Recommended Posts

Let's take a look at the infection tendency of the new coronavirus COVID-19 in each country and the medical response status (additional information).
Let's take a look at the feature map of YOLO v3
Let's take a look at the Scapy code. Overload of special methods __div__, __getitem__ and so on.
Let's test the medical collapse hypothesis of the new coronavirus
Take a look at the built-in exception tree structure in Python 3.8.2
Let's take a look at the forest fire on the west coast of the United States with satellite images.
Display the status of COVID 19 infection in Japan with Splunk (GitHub version)
[Python] Automatically read prefectural information on the new coronavirus from the PDF of the Ministry of Health, Labor and Welfare and write it in a spreadsheet or Excel.
Scraping data wrangling of statistical information on new coronavirus infection in Yamanashi Prefecture
Let's take a look at the Scapy code. How are you processing the structure?
Let's explain the difference in how the new coronavirus spreads between Japan and other countries with a cluster countermeasure model
Factfulness of the new coronavirus seen in Splunk
Let's put out a ranking of the number of effective reproductions of the new coronavirus by prefecture
To extract the data of a specific column in a specific sheet in multiple Excel files at once and put the data in each column in one row