[PYTHON] I tried to predict the number of domestically infected people of the new corona with a mathematical model

Overview

I found a paper by The Lancet [^ 1] that predicts the number of infected people with a mathematical model, so I implemented a mathematical model to predict the number of infected people in Japan and compared it with other infectious diseases.

Summary

--The number of infected people was predicted by the SEIR model. --The basic reproduction number of the new corona is about 2.9, which is a little higher than influenza. ――If infection control is not effective, the peak number of infected people in Japan is around GW. ――The number of domestic deaths in the next year will be over 10,000 (when the case fatality rate is 0.01%). --The immigration restrictions for foreigners are meaningless.

table of contents

  1. [Purpose of this time](# 1-Purpose of this time)
  2. [Mathematical model of infectious diseases](# 2-Mathematical model of infectious diseases)
  3. [Parameter estimation](# 3-Parameter estimation)
  4. [Implemented and illustrated in python](# 4-Implemented and illustrated in python)
  5. [Supplement](# 5-Supplement)

1. Purpose of this time

  1. Predict the number of people infected with the new coronavirus (COVID-19, hereinafter referred to as the new corona) in Japan using a mathematical model.

  2. Compare the basic reproduction number of the new corona with other infectious diseases.

  3. Implementation of scientific calculations using Scipy.

2. Mathematical model of infectious diseases

"Formulas describe the process of how an infectious disease spreads, how long an infected person develops and becomes severe." [^ 2]

In the mathematical model of infectious diseases, the population is divided according to the stage of infection, and the increase or decrease of each group is expressed by a differential equation. I will explain the simple SEIR model and the extended version used this time.

2.1. SEIR model

-$ S : Number of people without immunity (Susceptible) - E : Number of people in the post-exposure period (Exposed) - I : Number of people during the infectious period (Infectious) - R : Number of people who recovered from infection and gained immunity (Recovered) - \ beta $: Infection rate, $ \ gamma $: Recovery (quarantine) rate, $ \ sigma $: Probability of getting infectious, $ N $: Total population

Then, the following simultaneous differential equations are obtained.

\begin{align}
\frac{dS(t)}{dt} &= -\beta \frac{I(t)}{N}S(t) \\

\frac{dE(t)}{dt} &= \beta \frac{I(t)}{N}S(t) - \sigma E(t) \\

\frac{dI(t)}{dt} &= \sigma E(t) - \gamma I(t) \\

\frac{dR(t)}{dt} &= \gamma I(t) \\

N &= S(t) + E(t) + I(t) + R(t)
\end{align}

However, it ignores the birth rate and mortality rate, and says that it will never be infected again because it will recover and acquire immunity after the onset.

If you give an appropriate initial value and solve it, you can predict the number of infected people in the future. (I will put the figure [^ 3] for reference)

SEIR_InsetChart_default.png

For detailed explanation and implementation, refer to qiita's article [^ 4].

Here, $ R_0 = \ beta / \ gamma $ is called the basic reproduction number, and "(all individuals are initially susceptible). It means "the number of secondary infections produced per infected person (in the state of having)". [^ 5] ** The point is that the larger $ R_0 $, the greater the infectivity. ** **

You can also use $ R_0 $ to calculate how widespread vaccination should be to prevent an infectious disease epidemic.

In addition to SEIR, various models can be created by subdividing the entire population by place of residence and age, and by reflecting the characteristics of infectious diseases in differential equations. For example, there are SIR that does not consider E, SEIS that does not consider immunity acquisition, and MSEIR that considers passive immunity [^ 6].

2.2. Extended SEIRD model

Since the above SEIR is too simple, this time we will extend it to a model that considers the inflow and outflow of the population and the mortality rate with foreign countries by referring to the paper [^ 1].

-$ R_0 : Basic reproduction number ( R_0 = \ beta / \ gamma ) - T_E : Average post-exposure period (days) ( \ gamma ^ {-1} = T_I ) - T_I : Average infectious period (days) ( \ sigma ^ {-1} = T_E ) - N_ {J, I} : Number of travelers from Japan to overseas (people / day) - N_ {I, J} : Number of travelers from overseas to Japan (people / day) - f : Case fatality rate - D (t) $: Number of deaths (people)

Then, the following simultaneous differential equations are also obtained.

\begin{align}
\frac{dS(t)}{dt} &= - \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - N_{J,I} + 
 \left(1 - \frac{E(t) + I(t)}{N(t)} \right)N_{I,J} \\

\frac{dE(t)}{dt} &= \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - \frac{E(t)}{T_E} + \frac{E(t) + I(t)}{N(t)} N_{I,J}\\

\frac{dI(t)}{dt} &= \frac{E(t)}{T_E} - \frac{I(t)}{T_I}\\

\frac{dR(t)}{dt} &= (1-f) \frac{I(t)}{T_I}\\

\frac{dD(t)}{dt} &= f \frac{I(t)}{T_I} \\

N(t) &= S(t) + E(t) + I(t) + R(t)
\end{align}

However, regarding $ N_ {J, I} $ and $ N_ {I, J} $, it is assumed that all those who go abroad from Japan belong to $ S $, and those who come to Japan from overseas belong to $ S, E $. I will.

As in the paper [^ 1], parameters other than $ R_0 $ are substituted with appropriate values. $ T_I and T_E $ are set to $ T_E = 6.0 and T_I = 2.4 $ with reference to SARS Corona. $ N_ {J, I}, N_ {I, J} $ is set to $ N_ {J, I} = 40000, N_ {I, J} = 70000 $ with reference to the number of travelers in February 2019.

3. Parameter estimation

The unknown parameter of the model is $ R_0 $. There are various estimation methods such as least squares method, maximum likelihood estimation, MAP estimation, and Bayesian estimation [^ 7], but this time, the likelihood is calculated as a non-stationary Poisson process [^ 8], and $ R_0 $ is calculated by maximum likelihood estimation. ..

3.1. Maximum likelihood estimation

Maximum likelihood estimation that everyone loves. First, express the probability (likelihood) of the observed data as a function of $ R_0 $, and find $ R_0 $ that maximizes it. When the number of infected people is modeled as a non-stationary Poisson process with reference to the paper [^ 1] and the likelihood is calculated,

L(R_0) = \prod_{d=D_s}^{D_e} \frac{e^{-\lambda_d}\lambda_d^{x_d}}{x_d!}

However, $ D_s $: the first day of observation, $ D_s $: the last day of observation, $ x_d $: the number of infected people at $ d $,

\lambda (t, R_0) = E(t) + I(t) \\
\lambda _d (R_0) = \int_{d-1}^{d} \lambda (u, R_0) du

will do. $ \ lambda $ represents the average number of infected people when they follow a Poisson distribution.

Excluding the terms irrelevant to $ R_0 $ from the log-likelihood, $ R_0 $ can be estimated as follows.

\newcommand{\argmin}{\mathop{\rm arg~min}\limits}

\argmin_{R_0 > 0} - \log L(R_0) = \argmin_{R_0 > 0} \sum_{d=D_s}^{D_e} (\lambda_d(R_0) - x_d \log \lambda_d(R_0))

4. Implemented and illustrated in python

Implementation of SEIR class
import scipy
from scipy.integrate import odeint
from scipy.optimize import minimize
from sklearn.metrics import mean_absolute_error, mean_squared_error


class SIR(object):

    def __init__(self, r_0, t_i, dt, init_S, init_I, init_R):
        self.r_0 = r_0
        self.t_i = t_i
        self.dt = dt
        self.init_state_list = [init_S, init_I, init_R]

    def _get_param(self, params, key, default, t=None):
        """Corresponds to time-varying parameters
        """
        if isinstance(params, dict):
            if key in list(params.keys()):
                param = params[key]
                if isinstance(param, list):
                    return param[np.clip(int(t / self.dt), 0, len(param)-1)]
                elif isinstance(param, np.ndarray):
                    return param
                else:
                    return default
            else:
                return default
        else:
            return default

    def _ode(self, state_list, t=None, params=None):
        """Define simultaneous differential equations
        """
        r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
        t_i = self._get_param(params, 't_i', self.t_i, t=t)
        S, I, R = state_list
        N = S + I + R
        dstate_dt = list()
        dstate_dt[0] = - (r_0 / t_i) * (I / N) * S
        dstate_dt[1] = (r_0 / t_i) * (I / N) * S - I / t_i
        dstate_dt[2] = I / t_i
        return dstate_dt

    def solve_ode(self, len_days=365, params=None):
        """Solve differential equations
        """
        t = np.linspace(0, len_days, int(len_days / self.dt), endpoint=False)
        args = (params,) if params else ()
        return odeint(self._ode, self.init_state_list, t, args=args)


class CustomizedSEIRD(SIR):

    def __init__(self, r_0=None, t_e=6.0, t_i=2.4, n_i_j=70000, n_j_i=40000, f=0.0001, dt=1,
            init_S=126800000, init_E=0, init_I=1, init_R=0, init_D=0):
        self.r_0 = r_0
        self.t_e = t_e
        self.t_i = t_i
        self.n_i_j = n_i_j
        self.n_j_i = n_j_i
        self.f = f
        self.dt = dt
        self.init_state_list = [init_S, init_E, init_I, init_R, init_D]

    def _ode(self, state_list, t=None, params=None):
        """Define simultaneous differential equations
        """
        r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
        t_e = self._get_param(params, 't_e', self.t_e, t=t)
        t_i = self._get_param(params, 't_i', self.t_i, t=t)
        n_i_j = self._get_param(params, 'n_i_j', self.n_i_j, t=t)
        n_j_i = self._get_param(params, 'n_j_i', self.n_j_i, t=t)
        f = self._get_param(params, 'f', self.f)
        S, E, I, R, D = state_list
        N = S + E + I + R
        dstate_dt = list()
        dstate_dt.append(- (r_0 / t_i) * (I / N) * S - n_j_i + (1 - ((E + I) / N)) * n_i_j)
        dstate_dt.append((r_0 / t_i) * (I / N) * S - E / t_e + ((E + I) / N) * n_i_j)
        dstate_dt.append(E / t_e - I / t_i)
        dstate_dt.append((1 - f) * I / t_i)
        dstate_dt.append(f * I / t_i)
        return dstate_dt

    def _calc_neg_log_likelihood_r0(self, r_0, X):
        """Log likelihood(R_Only the part related to 0)To calculate
        """
        solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
        lambda_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2] # I
        return - np.sum(- lambda_arr + X * np.log(lambda_arr))
    
    def _calc_error(self, r_0, X, metric=mean_absolute_error):
        """Calculate mean absolute error and mean square error
        """
        solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
        e_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2]  # I
        return metric(X, e_arr)

    def exec_point_estimation(self, init_r_0, X, project='mle'):
        """Point estimate parameters
        """
        if project == 'mle':
            result = minimize(self._calc_neg_log_likelihood_r0, init_r_0, args=(X,), method='Nelder-Mead')
        elif project == 'lad':
            result = minimize(self._calc_error, init_r_0, args=(X, mean_absolute_error), method='Nelder-Mead')
        elif project == 'ls':
            result = minimize(self._calc_error, init_r_0, args=(X, mean_squared_error), method='Nelder-Mead')
        else:
            print(f'Invalid project: {project}')
            return None
        if self.r_0 is None:
            self.r_0 = result.x[0]
        return result

    def exec_map(self):
        """MAP estimation
        """
        pass

    def exec_mcmc(self):
        """MCMC method
        """
        pass

if __name__ == '__main__':
    # 1/16 to 3/Number of infected people up to 8
    X = np.array([
        1, 1, 1, 1, 1, 1, 1, 1, 2, 3,  # 1/25
        4, 4, 7, 8, 14, 17, 20, 20, 20, 23,  # 2/4
        35, 45, 86, 90, 96, 161, 163, 203, 251, 259,  # 2/14
        338, 414, 520, 616, 705, 728, 743, 743, 838, 851, # 2/24
        862, 891, 919, 938, 947, 961, 980, 999, 1035, 1056, # 3/5
        1113, 1157, 1190 # 3/8
    ])
    model = CustomizedSEIRD()
    result = model.exec_point_estimation(init_r_0=1, X=X, project='mle')
    print(result.x[0])
2.8806152343750036

The "number of infected people" is difficult to handle, and only those who visit the hospital are actually counted, and no one knows the true $ x_d $ considering the diagnostic criteria and the sensitivity specificity of the test. Hmm. Here, we estimated based on the number of infected people [^ 9] including cruise ships.

Prediction of the number of infected people in Japan by the extended SEIR model (I is the number of infected people)

SEIRD2.png

It is a graph when $ R_0 $ estimated by the blue line does not change in the future. If this goes on, the number of infected people will peak just after the GW is over. It is a graph when the red line and the blue line change $ R_0 $ after 3/9. Since $ R_0 $ is "the number of secondary infections produced per infected person", it will be smaller due to infection control measures and movement restrictions. If $ R_0 $ can be made small enough, it can be contained like a green line.

The figure below is a figure released by the Ministry of Health and Welfare [^ 9], but the shape of the graph of the number of infected people matches exactly.

厚生省コロナ感染者数.png

Also, for example, if you change $ N_ {i, j} $, you can calculate whether the number of infected people will change due to immigration restrictions of foreigners. If you calculate with the same model, even if $ N_ {i, j} = 0 $ after 3/9, if $ R_0 $ is the same, the number of infected people will not change. This means that immigration restrictions are meaningless (if this model is correct).

Next, let's compare the basic reproduction number and mortality rate with other infectious diseases [^ 10]. (The case fatality rate of the new corona is approximately 0.1% to 5% of the case fatality rate of each country.)

comparison.png

It seems that the basic reproduction number and case fatality rate of the new corona are not significantly higher than those of other infectious diseases. (For the risk of infectious diseases, it is necessary to consider the severity of symptoms and the presence or absence of vaccines, and I think that low basic reproduction number and case fatality rate do not mean that it is safe, but the details are Ask someone who is resistant to infectious diseases)

5. Supplement

Ministry of Health, Labor and Welfare

Japan Society for Infectious Diseases

WHO

CDC

Cool site where you can check the number of infected people (Johns Hopkins CSSE)

** I am not an infectious disease specialist, a public health major, or a statistician. The actual infectious disease is not as simple as this model, so this prediction is probably wrong. We are not responsible for any damages caused by the contents of this article. ** **

[^ 2]: Stop infectious diseases with a mathematical model

[^ 4]: Infectious Disease Mathematical Model Beginning: Overview of SEIR Model by Python and Introduction to Parameter Estimation

[^ 5]: Prediction of infectious disease epidemics: Quantitative issues in infectious disease mathematical models

[^ 7]: Maximum likelihood estimation, MAP estimation, Bayesian estimation learned in Fujii 4th Dan

[^ 8]: SIR model and transient Poisson process

[^ 9]: About new coronavirus infection

Recommended Posts

I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to predict the number of people infected with coronavirus in Japan by the method of the latest paper in China
I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out
Predict the number of people infected with COVID-19 with Prophet
Create a BOT that displays the number of infected people in the new corona
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition
I tried to streamline the standard role of new employees with Python
I tried to predict the price of ETF
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried to automatically send the literature of the new coronavirus to LINE with Python
python beginners tried to predict the number of criminals
I wanted to know the number of lines in multiple files, so I tried to get it with a command
I tried to summarize the new coronavirus infected people in Ichikawa City, Chiba Prefecture
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
I made a function to check the model of DCGAN
Let's visualize the number of people infected with coronavirus with matplotlib
I tried to divide with a deep learning language model
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
Matching app I tried to take statistics of strong people & tried to create a machine learning model
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
Day 71 I tried to predict how long this self-restraint will continue with the SIR model
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
I tried to automate the watering of the planter with Raspberry Pi
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I tried to create a list of prime numbers with python
I created a Discord bot on Docker that reports the number of corona infected people in Tokyo at a specified time.
I tried to expand the size of the logical volume with LVM
I tried to improve the efficiency of daily work with Python
I tried to make a mechanism of exclusive control with Go
I tried to unlock the entrance 2 lock sesame with a single push of the AWS IoT button
Stock price plummeted with "new corona"? I tried to get the Nikkei Stock Average by web scraping
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to discriminate a 6-digit number with a number discrimination application made with python
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
A beginner of machine learning tried to predict Arima Kinen with python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to get the movie information of TMDb API with Python
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to verify the result of A / B test by chi-square test
I tried to make a thumbnail image of the best avoidance flag-chan! With RGB values ​​[Histogram] [Visualization]
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
Create a bot that posts the number of people positive for the new coronavirus in Tokyo to Slack
I tried to predict next year with AI
I tried to save the data with discord
I tried to correct the keystone of the image
I tried to predict Titanic survival with PyCaret
I tried to vectorize the lyrics of Hinatazaka46!
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to make a simple mail sending application with tkinter of Python
I failed to install django with pip, so a reminder of the solution
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to predict the presence or absence of snow by machine learning.
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)