[PYTHON] [Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬

In the previous SEIR model, among the obtained parameters, values such as $ lp \ fallingdotseq 1e ^ {-23} $ were obtained. In such a case, there is a possibility that the digit loss or the differential equation is not working in the first place, so be careful. So, this time I will change this part and try to fit with the simplest model.

What i did

・ A little theory ・ Code explanation ・ The end of Japan ・ End of each country ・ A little discussion

・ A little theory

The SIR model is the following simple time evolution equation.

{\begin{align}
\frac{dS}{dt} &= -\beta \frac{SI}{N} \\
\frac{dI}{dt} &=  \beta \frac{SI}{N} -\gamma I \\
\frac{dR}{dt} &=  \gamma I \\
\end{align} 
}

The first equation is that the decrease in the number of uninfected persons is proportional to the number of infected persons and the number of uninfected persons, and is inversely proportional to the total number of target persons. Decrease. As β / N, it can be considered as the infection rate per capita. The second equation is the equation for increasing the number of infected people. The increase is the difference between the influx from uninfected individuals and the healers. The third equation is an equation that a certain percentage of infected persons are cured, and this coefficient γ is a cure rate called a removal rate. 【reference】 ・ [Is this infection spreading or converging: the physical meaning and determination of the reproduction number R](https://rad-it21.com/%E3%82%B5%E3%82%A4%E3%82%A8] % E3% 83% B3% E3% 82% B9 / kado-shinichiro_20200327 /)

Here, the second equation can be transformed as follows.

\frac{dI}{dt} =  \gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)I\\

That is, if $ S / N $ is considered to be constant, the following solution can be formally obtained.

I =  I_0\exp(\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t)\\

At the onset of infection, it is $ S_0 \ fallingdotseq N $, in which case the following solution can be obtained.

{\begin{align}
I &=  I_0\exp(\gamma (R_0 - 1)t)\\
here\\ 
R_0 &= \frac{\beta}{\gamma}
\end{align}
} 

Here, in the formula of $ I $, the number of infected persons decreases exponentially when $ R_0 <1 $, and conversely increases exponentially when $ R_0> 1 $. Therefore, this ** $ R_0 $ is called the basic reproduction number ** and serves as a guide for determining whether or not infection transmission occurs. In addition, in the above reference,

R = R_0\frac{S}{N}

Is defined as ** effective reproduction number **. This $ R $ becomes smaller as $ \ frac {S} {N} $ becomes smaller as the infection progresses. That is, as can be seen from the above formal solution, when $ R_0> 1 $, $ R> 1 $ is still present even at the initial stage of expansion, and the infection spreads, but eventually $ \ frac {S} {N} $ is small. Therefore, it is an index showing that the end starts when $ R <1 $ is realized. On the other hand, if the left side of $ \ frac {dI} {dt} = 0 $ = 0, the infection peaks, and the number of uninfected persons at this time is $ S_p $,

{\begin{align}
R_0 &= \frac{\beta}{\gamma}\\
& = \frac{N}{S_p} \\
\end{align}
}

Is established, at this time

{\begin{align}
R &= R_0\frac{S_p}{N}\\
&= 1
\end{align}
}

It turns out that holds true. That is, ** the effective reproduction number R = 1 at the peak of infection, and the transmission of infection will end thereafter. ** ** This time, I will look at the end prediction by plotting this ** effective reproduction number ** as well.

・ Code explanation

The entire code is placed below. ・ Collive_particles / fitting_SIR_COVID2.py The Lib used is as follows same as last time.

import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import pandas as pd

Data in advance from COVID-19 site Is downloading.

#Read CSV data with pandas.
data = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
data_r = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
data_d = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
    
confirmed = [0] * (len(data.columns) - 4)
confirmed_r = [0] * (len(data_r.columns) - 4)
confirmed_d = [0] * (len(data_d.columns) - 4)
days_from_22_Jan_20 = np.arange(0, len(data.columns) - 4, 1)

The record order of the data was correctly arranged last time, but this time the order of the countries and regions of the three data is not the same, so obtain each as follows, and calculate the number of existing infections for the infection number data In order to do so, I try to acquire it after acquiring the healing number data. The if statement is used interchangeably between the country and the region. In addition, the total number of cures and total number of deaths required for mortality and cure rate are not calculated this time, so they are commented out.

#City
city = "Japan"
#Process the data
t_cases = 0
for i in range(0, len(data_r), 1):
    if (data_r.iloc[i][1] == city): #for country/region
    #if (data_r.iloc[i][0] == city):  #for province:/state  
        print(str(data_r.iloc[i][0]) + " of " + data_r.iloc[i][1])
        for day in range(4, len(data.columns), 1):            
            confirmed_r[day - 4] += data_r.iloc[i][day]
            #t_recover += data_r.iloc[i][day]
for i in range(0, len(data), 1):
    if (data.iloc[i][1] == city): #for country/region
    #if (data.iloc[i][0] == city):  #for province:/state  
        print(str(data.iloc[i][0]) + " of " + data.iloc[i][1])
        for day in range(4, len(data.columns), 1):
            confirmed[day - 4] += data.iloc[i][day] -  confirmed_r[day - 4]            
for i in range(0, len(data_d), 1):
    if (data_d.iloc[i][1] == city): #for country/region
    #if (data_d.iloc[i][0] == city):  #for province:/state  
        print(str(data_d.iloc[i][0]) + " of " + data_d.iloc[i][1])
        for day in range(4, len(data.columns), 1):
            confirmed_d[day - 4] += data_d.iloc[i][day] #fro drawings
            #t_deaths += data_d.iloc[i][day]

This time we will use the SIR model. The evaluation function also returns the number of uninfected S, the number of infections I, and the number of cures R.

#define differencial equation of sir model
def sir_eq(v,t,beta,gamma):
    a = -beta*v[0]*v[1]
    b = beta*v[0]*v[1] - gamma*v[1]
    c = gamma*v[1]
    return [a,b,c]
def estimate_i(ini_state,beta,gamma):
    v=odeint(sir_eq,ini_state,t,args=(beta,gamma))
    est=v[0:int(t_max/dt):int(1/dt)] 
    return est[:,0],est[:,1],est[:,2] #v0-S,v1-I,v2-R

The objective function is the normal variance. Here, the number of infections I and the number of cures R are linearly summed so that they can be fitted at the same time.

#define logscale likelihood function
def y(params):
    est_i_0,est_i_1,est_i_2=estimate_i(ini_state,params[0],params[1])
    return np.sum(f1*(est_i_1-obs_i_1)*(est_i_1-obs_i_1)+f2*(est_i_2-obs_i_2)*(est_i_2-obs_i_2))

Enter the total number of infections in the country (or region) selected above in N0. In addition, f1 and f2 determine which curve the fitting should be performed on. N, S0, I0, R0 are the initial values of each, but these are also constants that should be adjusted according to the fitting condition. The initial values of beta and gamma are similar in most countries, but it seems better to re-enter the values once obtained and calculate recursively.

#solve seir model
N0 = 1517
f1,f2 = 1,1 #data & data_r fitting factor
N,S0,I0,R0=int(N0*1.5),int(N0*1.5),int(10),int(0)
ini_state=[S0,I0,R0]
beta,gamma = 1.6e-6, 1e-3 #The initial value does not converge unless it is close to some extent

t_max=len(days_from_22_Jan_20) #Fitting within a certain range of data
dt=0.01
t=np.arange(0,t_max,dt)

I will try to display it below, including checking the initial value.

plt.plot(t,odeint(sir_eq,ini_state,t,args=(beta,gamma)))
plt.legend(['Susceptible','Infected','Recovered'])

obs_i_1 = confirmed
obs_i_2 = confirmed_r

plt.plot(obs_i_1,"o", color="red",label = "data_1")
plt.plot(obs_i_2,"o", color="green",label = "data_2")
plt.legend()
plt.pause(1)
plt.close()

Make a fitting. I used minimize @ scipy this time as well. Calculate the basic reproduction number R0. Since the variable name overlaps with the initial healing number above, it is in lowercase.

#optimize logscale likelihood function
mnmz=minimize(y,[beta,gamma],method="nelder-mead")
print(mnmz)
#R0
#N_total = S_0+I_0+R_0
#R0 = N_total*beta_const *(1/gamma_const)
beta_const,gamma = mnmz.x[0],mnmz.x[1] #Infection rate, removal rate (curing rate)
print(beta_const,gamma)
r0 = N*beta_const*(1/gamma)
print(r0)

Graph it below. Ax3 and ax4 sub-axises are added to plot the effective reproduction number.

#plot reult with observed data
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 4, 4*2))
ax3 = ax1.twinx()
ax4 = ax2.twinx()

lns1=ax1.plot(confirmed,"o", color="red",label = "data_I")
lns2=ax1.plot(confirmed_r,"o", color="green",label = "data_R")
est_i_0,est_i_1,est_i_2=estimate_i(ini_state,mnmz.x[0],mnmz.x[1])
lns3=ax1.plot(est_i_1, label = "estimation_I")
lns0=ax1.plot(est_i_2, label = "estimation_R")
lns_1=ax3.plot((S0-est_i_1-est_i_2)*r0/N,".", color="black", label = "effective_R0")
ax3.set_ylim(0,r0+1)

lns_ax1 = lns1+lns2 + lns3 + lns0 +lns_1
labs_ax1 = [l.get_label() for l in lns_ax1]
ax1.legend(lns_ax1, labs_ax1, loc=0)
ax1.set_ylim(0,N0)
ax1.set_title('SIR_{} f1_{:,d} f2_{:,d};N={:.0f} S0={:.0f} I0={:.0f} R0={:.0f} R={:.2f}'.format(city,f1,f2,N,S0,I0,R0,(S0-est_i_1[-1]-est_i_2[-1])*r0/N))
ax1.set_ylabel("Susceptible, Infected, recovered ")
ax3.set_ylabel("effective_R0")

The future forecast is graphed below.

t_max=200 #len(days_from_22_Jan_20)
t=np.arange(0,t_max,dt)

lns4=ax2.plot(confirmed,"o", color="red",label = "data")
lns5=ax2.plot(confirmed_r,"o", color="green",label = "data_r")
est_i_0,est_i_1,est_i_2=estimate_i(ini_state,mnmz.x[0],mnmz.x[1])
lns6=ax2.plot(est_i_0, label = "estimation_S")
lns7=ax2.plot(est_i_1, label = "estimation_I")
lns8=ax2.plot(est_i_2, label = "estimation_R")
lns_6=ax4.plot((S0-est_i_1-est_i_2)*r0/N,".", color="black", label = "effective_R0")

lns_ax2 = lns4+lns5 + lns6 + lns7+ lns8 +lns_6
labs_ax2 = [l.get_label() for l in lns_ax2]
ax2.legend(lns_ax2, labs_ax2, loc=0)
ax4.set_ylim(0,r0+1)
ax2.set_ylim(0,S0)
ax2.set_title('SIR_{};b={:.2e} g={:.2e} r0={:.2f}'.format(city,beta_const,gamma,r0))
ax2.set_ylabel("Susceptible, Infected, recovered ")
ax4.set_ylabel("effective_R0")

ax1.grid()
ax2.grid()
plt.savefig('./fig/SIR_{}f1_{:,d}f2_{:,d};b_{:.2e}g_{:.2e}r0_{:.2f}N_{:.0f}S0_{:.0f}I0_{:.0f}R0_{:.0f}.png'.format(city,f1,f2,beta_const,gamma,r0,N,S0,I0,R0)) 
plt.show()
plt.close()

・ The end of Japan

** Data is data up to March 27, 2020 **. The data is as follows. fig_Japan_.png The results are as follows. SIR_Japanf1_1f2_1;b_4.53e-05g_1.58e-02r0_6.52N_2275S0_2275I0_10R0_0.png This result seems to be fitted for the time being, but today's press says that it infected 200 people yesterday, March 28, so it's out of this graph and doesn't match. .. I posted this as a bonus. The obtained quantities are as follows.

{\begin{align}
\beta &= 4.53e^{-5}\\
\gamma &= 1.58e^{-2}\\
N &= 2275\\
R0 &= 6.52\\
R(27.3.2020) &= 2.71\\
\end{align} 
}

・ End of each country

The situation in Hubei

The latest (27.3.2020) data is as follows. fig_Hubei_.png Wuhan data cannot be matched as follows when trying to fit both infection and cure data. However, as you can see in the graph, it can be said that the effective reproduction number R = 0.01, 0.08. SIR_Hubeif1_1f2_0;b_3.19e-06g_6.53e-02r0_5.20N_106462S0_106462I0_318R0_389.png

{\begin{align}
\beta &= 3.19e^{-6}\\
\gamma &= 6.53e^{-2}\\
N &= 106462\\
R0 &= 5.20\\
R(27.3.2020) &= 0.01\\
\end{align} 
}

The following are fittings of both at the same time, and the peak number of infections has shifted significantly. SIR_Hubeif1_1f2_1;b_2.34e-06g_5.33e-02r0_4.67N_106462S0_106462I0_318R0_389.png

{\begin{align}
\beta &= 2.34e^{-6}\\
\gamma &= 5.33e^{-2}\\
N &= 106462\\
R0 &= 4.67\\
R(27.3.2020) &= 0.08\\
\end{align} 
}

Status of Korea, South

The latest (27.3.2020) data is as follows. fig_Korea, South_.png

The Korean data is not as bad as Wuhan, but it is almost the same, and even if you try to fit both the infection number and cure number data, they cannot be matched as follows. However, as you can see in the graph, the effective reproduction number R = 0.13 is ending. SIR_Korea, Southf1_1f2_0;b_2.25e-05g_2.94e-02r0_8.68N_11365S0_11365I0_1R0_0.png

{\begin{align}
\beta &= 2.25e^{-5}\\
\gamma &= 2.94e^{-2}\\
N &= 11365\\
R0 &= 8.68\\
R(27.3.2020) &= 0.13\\
\end{align} 
}

If you try to fit both, the fitting of the infection number data will be loose and the infection peak will shift to the right. It may be possible to fit it if it fits more properly, but this time I will not pursue it any more. SIR_Korea, Southf1_1f2_1;b_2.11e-05g_2.14e-02r0_11.17N_11365S0_11365I0_1R0_0.png

{\begin{align}
\beta &= 2.11e^{-5}\\
\gamma &= 2.14e^{-2}\\
N &= 11365\\
R0 &= 11.17\\
R(27.3.2020) &= 0.18\\
\end{align} 
}

・ Situation of Iran

The next country that is likely to end is Iran. The latest (27.3.2020) data is as follows. However, when I came here and the cure rate exceeded 30%, it seemed to be stagnant. fig_Iran_.png Even so, the effective reproduction number is 1.81, and it will soon be 1, so it can be said that the peak number of infections is near. SIR_Iranf1_1f2_1;b_2.92e-06g_4.94e-02r0_3.70N_62478S0_62478I0_10R0_0.png

{\begin{align}
\beta &= 2.92e^{-6}\\
\gamma &= 4.94e^{-2}\\
N &= 62478\\
R0 &= 3.70\\
R(27.3.2020) &= 1.81\\
\end{align} 
}

・ Situation of Italy

The next country of concern is Italy. The latest (27.3.2020) data is as follows. fig_Italy_.png The data for this country is very solid and it feels good to fit both the infection and cure curves. The result is as follows, the effective reproduction number is 4.07, but it has dropped sharply and it seems that it will end in about 10 days. SIR_Italyf1_1f2_1;b_1.46e-06g_1.91e-02r0_10.99N_143448S0_143448I0_1R0_0.png

{\begin{align}
\beta &= 1.46e^{-6}\\
\gamma &= 1.91e^{-2}\\
N &= 143448\\
R0 &= 10.99\\
R(27.3.2020) &= 4.07\\
\end{align} 
}

Looking at this data, I get the impression that medical care has not collapsed and that management such as the effect of urban blockade is solid. (Comments have no scientific basis other than graphs. It is an individual impression.)

・ Situation of Spain

Next, I will show the situation in Spain, where deaths are increasing rapidly. When I write the conclusion, it seems that it is still in the early expansion period, and the fitting is indefinite.

{\begin{align}
\beta &= 1.49e^{-6}\\
\gamma &= 1.37e^{-2}\\
N &= 127542\\
R0 &= 13.85\\
R(27.3.2020) &= 7.81\\
\end{align} 
}

SIR_Spainf1_1f2_1;b_5.39e-07g_1.94e-02r0_9.86N_354285S0_354285I0_1R0_0.png

{\begin{align}
\beta &= 5.39e^{-7}\\
\gamma &= 1.94e^{-2}\\
N &= 354285\\
R0 &= 9.86\\
R(27.3.2020) &= 8.09\\
\end{align} 
}

Actually, the same is true for Germany, France, and Switzerland, but I tried to fit it, but due to the rapid increase, this time (parameters are undefined), so I will write it again.

・ A little discussion

I'm not sure if I can really accept it, so I'll write a little. The biggest drawback of this fitting is that it is fixed to N. It seems that the situation where the infection is transmitted naturally in a closed subject can be simulated properly. However, many countries in the world have not taken complete lockdowns, and at least up to this point there has been a movement of people, including (as a percentage) many infected people themselves. In Japan, the number of such outpatients is increasing, and there are also situations where clusters are formed from there. So, what I wrote here is a fitting of the transition so far, and I can not predict the future situation. In order for these predictions to be correct, it is assumed that the same situation will continue, and it is necessary to be able to control the proportion of outpatients extremely. In fact, it is possible to write a model that considers these effects, but since it is difficult to predict the number of outpatients that change randomly from moment to moment, it seems impossible to predict with this model. So, I think that fitting will be possible if the whole of Europe and the United States begins to be seen, and it will be possible to discuss the obtained parameters.

Summary

・ I tried fitting COVID-19 data with the SIR model. ・ It turned out that Wuhan is nearing its end, and South Korea has exceeded the peak of infection toward the end. ・ Iran and Italy can be expected to reach the peak of infection relatively quickly in about 10 days. ・ In Japan, the peak of infection is relatively late because it is saturated with sloppyness, and it can be expected that the peak of infection will come in 20 days depending on the outpatients. ・ If the current situation progresses, it is thought that the peak of infection will begin to appear in many countries in the next month.

・ I want to monitor daily and see changes in the situation ・ I would like to evaluate the obtained parameters.

bonus

Now, when I look at the data, it was reflected ** until March 28, 2020, so I tried fitting it. However, yesterday's data are outliers, and the peak infection period does not change much. fig_Japan_.png SIR_Japanf1_1f2_1;b_3.91e-05g_1.65e-02r0_6.22N_2617S0_2617I0_10R0_0.png The parameters obtained are as follows, and have not changed much compared to the above. However, ** the effective reproduction number R is small, but it is increasing, so be careful **.

{\begin{align}
\beta &= 3.91e^{-5}\\
\gamma &= 1.65e^{-2}\\
N &= 2617\\
R0 &= 6.22\\
R(28.3.2020) &= 2.79\\
\end{align} 
}

I would like to see the future trends.

Recommended Posts

[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬
[Introduction to logarithmic graph] Predict the end time of each country from the logarithmic graph of infection number data ♬
[Introduction to matplotlib] Read the end time from COVID-19 data ♬
[Introduction to SIR model] Consider the fitting result of Diamond Princess ♬
[Introduction to SEIR model] Try fitting COVID-19 data ♬
I tried to predict the behavior of the new coronavirus with the SEIR model.
[Introduction to minimize] Data analysis with SEIR model ♬
[Introduction to Python] How to get the index of data with a for statement
Predict the number of people infected with COVID-19 with Prophet
I just wanted to extract the data of the desired date and time with Django
Try scraping the data of COVID-19 in Tokyo with Python
A network diagram was created with the data of COVID-19.
Implement the mathematical model "SIR model" of infectious diseases with OpenModelica
Day 71 I tried to predict how long this self-restraint will continue with the SIR model
I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition
[Introduction to Python] How to get data with the listdir function
Try to extract the features of the sensor data with CNN
I tried to predict the number of domestically infected people of the new corona with a mathematical model
How to extract features of time series data with PySpark Basics
Save the results of crawling with Scrapy to the Google Data Store
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
I want to know the population of each country in the world.
Memorandum of introduction of EXODUS, a data model of the finite element method (FEM)
Try to image the elevation data of the Geographical Survey Institute with Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
Get the number of visits to each page with ReportingAPI + Cloud Functions
Try to react only the carbon at the end of the chain with SMARTS
[Introduction to Python] What is the method of repeating with the continue statement?
Predict time series data with neural network
The story of verifying the open data of COVID-19
Predict infectious disease epidemics with SIR model
[Introduction to Data Scientists] Basics of Python ♬
I made you to express the end of the IP address with L Chika
Specify the start and end positions of files to be included with qiitap
Introduction to Statistical Modeling for Data Analysis Expanding the range of applications of GLM
Make sure to align the pre-processing at the time of forecast model creation and forecast
[Verification] Does levelDB take time to register data when the amount of data increases? ??
An introduction to data analysis using Python-To increase the number of video views-
Make it easy to specify the time of AWS CloudWatch Events with CDK.
I tried to create a model with the sample of Amazon SageMaker Autopilot
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]
Convert data with shape (number of data, 1) to (number of data,) with numpy.
I tried to save the data with discord
Introduction to Python with Atom (on the way)
Predict the second round of summer 2016 with scikit-learn
View details of time series data with Remotte
[Introduction to cx_Oracle] (5th) Handling of Japanese data
From the introduction of pyethapp to the execution of contract
I tried to predict the price of ETF
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
How to get started with Visual Studio Online ~ The end of the environment construction era ~
The story of trying to contribute to COVID-19 analysis with AWS free tier and failing
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
How to count the number of occurrences of each element in the list in Python with weight
It's time to seriously think about the definition and skill set of data scientists