[PYTHON] [Introduction to logarithmic graph] Predict the end time of each country from the logarithmic graph of infection number data ♬

This time, we will predict the timing of infection peaks in each country by logarithmic plotting.

What i did

・ A little theory ・ Forecast of each country ・ Code explanation ・ Additional theory ・ Forecast of each country II

・ A little theory

Last time, the following equation was derived.

I =  I_0\exp(\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t)\\

If you take the logarithm of both sides,

\ln I =  \ln I_0+\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t\\

Is obtained. That is, if you draw a graph with $ \ ln I $ on the vertical axis and time $ t $ on the horizontal axis, the slope will be

\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)\\

You can draw a straight line. Here, this slope depends on $ S $ and is the number of uninfected persons that decreases over time. And this slope can be rewritten as follows by using the effective reproduction number $ R $ introduced last time.

\gamma (R - 1)\\
R = R_0 \frac{S}{N}\\
R_0 = \frac{\beta}{\gamma}\\

Therefore, this slope becomes 0 at R = 1, that is, the number of reproductions = 0, and the number of infected persons does not increase any more, so that the infection peaks. This time, based on this theory, I would like to plot the data of each country in $ \ log I \ vs \ t $ and see the time of the peak infection in each country.

Also, from the slope of the rising edge, $ S \ fallingdotseq N $, and $ \ gamma (R_0 -1) $ can be obtained. Fitting the graph obtained with this as the initial value, it seems that $ \ gamma $, $ \ beta $, $ R_0 $ can be obtained.

・ Forecast of each country

・ Situation of Korea, South, Diamond Princess, Italy

It may seem distrustful that it contains Italy, but the results are as follows: If you look at these three countries side by side, you can see that they are quite different. That is, if the x and y axes are translated appropriately, they are likely to overlap. In other words, you can see that there are areas with the same slope. In each graph, you can see ** intervals that increase 10 times in almost 10 days **. Yes, DP is likely to saturate the infection curve first, then South Korea, and then Italy. The graph below plots the slope. In other words, when it reaches 0, it means that it is saturated. So, you can see the above from this graph. When I talk about Italy, I've been told a lot, but I think it will reach its peak in about 10 days.

・ Situation in European countries

The same graph is displayed for Spain, Germany, Switzerland, United Kingdom, and the Netherlands other than Italy. logmulti_United Kingdom_.png When I write it again, the UK is a little behind, but the others draw similar curves. The vertical axis differs depending on each situation (population, etc.), but the slope changes in almost the same way. And, looking at the slope diagram, it seems that it will be saturated all together within about 2 weeks **.

・ A little disappointing country

The three countries of Japan, Iran, and Bahrain also have very similar curves in a sense. Since the situation of rising is different, the y-axis is completely different, so I think the situation of the medical system is different. However, the inclination after standing up is quite similar, and despite the fact that it has been almost saturated in the last few days, Iran and Japan have increased again. Bahrain also has a negative slope, but it has returned to positive again. The gradients of these three countries are ** 10 times / 30 days, which is a gentle gradient that seems to be saturated at any time after it stabilizes. These countries are still infected and need to be contained with tension throughout society.

・ Rumored American situation

I tried to plot Sweden and US in the same way.

As you can see at a glance, the rise is similar. However, Sweden has a smaller slope from the middle. However, looking at the slope graph below, it's almost parallel to the x-axis these days, and it doesn't look like it's heading towards zero. In other words, it doesn't seem to saturate in the near future. However, in the United States, the data from the 58th to the 65th are available, and it seems that the slope is rapidly approaching 0. If you extend this, you can see that ** there is a possibility that the infection peaks in about 10 days **. It can be seen that the maximum gradient of both countries is about 25 times / 20 days, which is larger than the previous gradient.

・ Code explanation

Lib to be used and data reading are the same as last time. The death count data is also processed, but it is deleted because it is not used below.

import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import pandas as pd

#Read CSV data with pandas.
data = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
data_r = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

Below, the data processing function returns the processed infection number data and gradient data.

def data_city(city):
    #Process the data
    t_cases = 0
    t_recover = 0
    t_deaths = 0
    for i in range(0, len(data_r), 1):
        if (data_r.iloc[i][1] == city): #for country/region
        #if (data_r.iloc[i][0] == city):  #for province:/state  
            print(str(data_r.iloc[i][0]) + " of " + data_r.iloc[i][1])
            for day in range(4, len(data.columns), 1):            
                confirmed_r[day - 4] += data_r.iloc[i][day]
            t_recover += data_r.iloc[i][day]      
    for i in range(0, len(data), 1):
        if (data.iloc[i][1] == city): #for country/region
        #if (data.iloc[i][0] == city):  #for province:/state  
            print(str(data.iloc[i][0]) + " of " + data.iloc[i][1])
            for day in range(4, len(data.columns), 1):
                confirmed[day - 4] += data.iloc[i][day] -  confirmed_r[day - 4]
                diff_confirmed[day - 4] += confirmed[day-4] /  confirmed_r[day - 4]

    tl_confirmed = 0
    dlog_confirmed = [0] * (len(data.columns) - 4)
    dlog_confirmed[0]=np.log(confirmed[0])
    dlog_confirmed[1]=np.log(confirmed[1])-np.log(confirmed[0])
    for i in range(2, len(confirmed)-2, 1):
        if confirmed[i] > 0:
            dlog_confirmed[i]=(np.log(confirmed[i+1])-np.log(confirmed[i-1]))/2
        else:
            continue
    tl_confirmed = confirmed[len(confirmed)-1] + confirmed_r[len(confirmed)-1] + confirmed_d[len(confirmed)-1]
    t_cases = tl_confirmed
            
    return confirmed, dlog_confirmed

The following is used to plot multiple countries on the same diagram. Enter the country you want to draw in city_list. Use the above data processing function to get the infection count data and its semi-logarithmic slope. Draw this. Therefore fig, (ax1, ax2) =. .. .. The definition of is defined outside the for statement. Finally, I will display them all together. I don't dare to define the color, but it is automatically selected.

city_list={ "Spain","Germany","Switzerland","United Kingdom", "Netherlands" }
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 4, 4*2))
for city in city_list:
    confirmed = [0] * (len(data.columns) - 4)
    confirmed_r = [0] * (len(data_r.columns) - 4)
    confirmed_d = [0] * (len(data_d.columns) - 4)
    diff_confirmed = [0] * (len(data.columns) - 4)
    days_from_22_Jan_20 = np.arange(0, len(data.columns) - 4, 1)
    days_from_22_Jan_20_ = np.arange(0, len(data.columns) - 4, 1)
    recovered_rate = [0] * (len(data_r.columns) - 4)
    deaths_rate = [0] * (len(data_d.columns) - 4)
    
    confirmed, dlog_confirmed = data_city(city)
    #matplotlib drawing
    lns1=ax1.semilogy(days_from_22_Jan_20, confirmed, ".-", label = str(city))
    lns4=ax2.plot(days_from_22_Jan_20_, dlog_confirmed, ".-",label = str(city))

ax1.set_xlabel("days from 22, Jan, 2020")
ax1.set_ylabel("casas ")
ax2.set_ylabel("slopes ")
ax1.legend(loc=0)
ax2.legend(loc=0)
ax1.grid()
ax2.grid()

plt.pause(1)
plt.savefig('./fig/logmulti_{}_.png'.format(city)) 
plt.close()

・ Additional theory

In fact, it can be imagined that the transition of the cure number curve will be similar to the infection number curve, although there is a delay in recovery. So, I tried to talk about that story below. First, substituting $ I $ into the third equation of the SIR model gives the following.

Since the R of the reproduction number and the R of the recovery number are incurred, the recovery number is made lowercase here.

\frac{dr}{dt} =  \gamma I_0\exp(\gamma (R - 1)t) \\

Therefore, if you take the logarithm of both sides

\ln (\frac{dr}{dt}) =  \ln \gamma + \ln I_0 + \gamma (R - 1)t \\

Will be. That is, the time derivative (slope) of the recovery rate can be expected to have the same curve as the above equation, and y-intercepts differ by $ \ ln \ gamma $. However, this seems to be difficult to handle due to the large error, so I will change it a little.

That is, when R is integrated as a constant (although not analytically correct), the following equation is obtained.

r =  \frac{I_0}{(R-1)}\exp(\gamma (R - 1)t) + r_0\\

Therefore, if you take the logarithm of both sides

\ln r =  \ln I_0 -\ln (R-1) + \gamma (R - 1)t +\ln r_0

Can be written. Therefore, it can be expected that a curve that is almost parallel and has a different y-intercept of $ -ln (R-1) + \ ln r_0 $ can be obtained when compared with the above formula of $ \ ln I $.

・ Forecast of each country II

・ Situation of Wuhan

A beautiful curve was obtained as shown below. As expected, the infection and cure curves start in parallel and intersect after the peak infection. Along with that, the difference between the two curves ($ \ log (A / B) = \ log A- \ log B $) passes straight through $ 10 ^ 0 $.

・ Situation in Italy

Both curves are parallel, and I concluded above that the number of infections is likely to peak in about 10 days, but after that, it seems that they do not intersect easily. After this, it is considered that they will approach and intersect, but the healing number data may be suspicious.

・ Spain on behalf of Europe

Both curves are steadily approaching, and after the peak number of infections, they intersect and the end is visible.

・ Situation in Japan

In Japan, the two curves are gradually approaching, and the difference is heading toward 0, but the difference has widened due to the recent increase in the number of infections. So, as with the above, the conclusion is that the spread of infection is continuing and needs to be suppressed.

Summary

・ Logarithmic plots of infection count data in each country were used to classify and predict the peak timing of each infection count. ・ The end time was predicted by plotting the number of infections and the number of cures at the same time.

・ I want to track the changing situation every day ・ Since the theory of the additional part is suspicious, I will organize it a little more and connect it to the parameter determination.