[PYTHON] (First post) A story about numerical calculation of influenza and new pneumonia coronavirus with Tensorflow

Introduction

Abstract

I tried to analyze using a mathematical model.

background

This time I tried to calculate the infectious disease model numerically, but please understand that I am not a professional in mathematical models such as infectious disease models. Since I majored in physics at university and graduate school, I have only touched on computer science to some extent. The background to this post was that I was curious if I could analyze something quantitatively about Coronavirus, which is a hot topic right now. There are SARS and MERS in the past cases of infectious diseases, but I thought it would be worthwhile to investigate what kind of mathematical model is available for predicting the convergence time of infectious diseases.

Infectious disease model

SIR model

There is a SIR model as a model equation that describes the epidemic process of infectious diseases. There are three variables: Suscceptible (suspected infected), Infected (infected), and Recovered (immunity carrier). This model was published in a 1927 paper (“A Contribution to the Mathematical Theory of Epidemics”). In particular, it is often used in fields dealing with familiar phenomena such as social physics.

S+I \rightarrow 2I \\
I \rightarrow R \\

The elementary process is the above two processes. The top is the infection process and the bottom is the recovery process. Think of it as a chemical reaction formula. When actually incorporating it into a differential equation, the same calculation logic as the reaction kinetics in chemistry should be used.

Formula

The SIR model is formulated as follows.

\frac{dS(t)}{dt} = -\beta S(t)I(t) \\
\frac{dI(t)}{dt} = \beta S(t)I(t)-\gamma I(t)\\
\frac{dR(t)}{dt} = \gamma I(t) \\

For $ t $ at a given time, $ S (t) $ is the number of suspected infections, $ I (t) $ is the number of infected people, and $ R (t) $ is the number of immune carriers.

Here, $ \ beta $ represents the infection rate, and $ \ gamma $ represents the recovery rate and quarantine rate. Also

 -\beta I(t) \\

Equivalent to infectivity.

Especially why it can be described like the above differential equation http://www2.meijo-u.ac.jp/~tnagata/education/react/2019/react_03_slides.pdf I think you should refer to. The logic is the same as when formulating the reaction rate equation for high school chemistry.

Here, I solve the above simultaneous differential equations, and when I saw this, I remembered the Lorenz equations I learned in college. The Lorenz equation is a nonlinear ordinary differential equation (commonly known as ODE) that is talked about in the area of chaos in physics. I think I'm quite a maniac, so I won't go into details, but I'll just write the formula.

\frac{dx}{dt} = -px+py \\
\frac{dy}{dt} = -xz+rx-y\\
\frac{dz}{dt} = xy-bz \\

I used to do this numerical simulation for a university assignment, so I thought that I should do the same simulation as at that time. At that time, I remember solving it with Runge-Kutta, but this time I will try using TensorFlow for studying.

What is TensorFlow?

TensorFlow is an open source software library for Machine Learning developed by Google. In a nutshell, it's a Python library for high-speed numerical analysis. I will not do Machine Learning this time, but I will use it for numerical calculation. I don't think you can hear the tensor, but I understand that it is a dimension extension of the matrix. Since it is out of the scope of this post, I will post it somewhere. The feature of TensorFlow is that the source code can be written easily. On the other hand, most of the calculation process is like a black box, so the calculation process is not clear.

Actual numerical simulation (1)

Source code

Parameters etc. will be explained in the result part.

sir_model.py


import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
#It may not work depending on the version of TensorFlow(Probably due to compatibility)
#import tensorflow as tf
import matplotlib.pyplot as plt
import time
from scipy.integrate import solve_ivp
%matplotlib inline

def main(): 
    x=tf.placeholder(dtype=tf.float64,shape=(3),name='x')

    class SIR(object):
        def __init__(self, beta=0.00218, gamma=0.452, **kwargs):
            self.beta = float(beta)
            self.gamma = float(gamma)
        
        def __call__(self, t, x):
            dx_dt = -self.beta * x[0]*x[1]
            dy_dt = self.beta*x[0] *x[1] - self.gamma*x[1]
            dz_dt = self.gamma*x[1]

            dX_dt = tf.stack([dx_dt, dy_dt, dz_dt])
            return dX_dt
    
    f_sir = SIR()
    fx =f_sir(None, x)  # ODE

    x0 = np.array([762, 1, 0], dtype=np.float64)   # initial value
    sess = tf.Session()
    sess.run(tf.initializers.global_variables())

    print('###value of f(x)###')
    print(sess.run(fx, feed_dict={x:x0})) 
    dt=0.1
    tstart=0.0
    tend=30.0
    ts=np.arange(tstart, tend+dt, dt)  # 0.Output 30 days in one step

    def f_sir_tf(t,xt):
        return sess.run(fx, feed_dict={x: xt})

    start_time =time.time() # measurement of time
    sol_sir = solve_ivp(fun=f_sir_tf,
                       t_span=[tstart, tend], y0=x0, t_eval=ts)
    integration_time_tf = time.time() - start_time
    t_lo = sol_sir['t']   #Get the time of each step
    x_lo = sol_sir['y']  #X of each step(t)Get the value of
    
    #Calculate the processing time
    print("processing time (tf) %.5f" % integration_time_tf)

    fig=plt.figure(figsize=(14,8))
    ax=fig.add_subplot(111)

    ax.plot(t_lo,x_lo[0],'*',color='blue',alpha=0.2,label="Susceptible",markersize=10)
    ax.plot(t_lo,x_lo[1],'^',color='red',alpha=0.2,label="Infected",markersize=10)
    ax.plot(t_lo,x_lo[2],'.',color='green',alpha=0.2,label="Recovered",markersize=10)
    ax.grid()
    ax.legend()
    ax.set_xlabel('day')
    ax.set_ylabel('number')

if __name__ == "__main__":
    main()

simulation result

image.png

For this simulation, I referred to p14 on the following slide. http://www.actuaries.jp/lib/meeting/reikai20-7-siryo.pdf This is a case of influenza in a boarding school investigated in 1978. Of the total of 763 people, $ S = 762 $ people and $ I = 1 $ people are the initial conditions. For infection rate $ \ beta $ and recovery rate $ \ gamma $ $\beta=0.00218,\gamma=0.452$ We are simulating as. To elaborate on the parameters, the infection rate $ \ beta $ is the rate of infection per day and the recovery rate $ \ gamma $ is equivalent to how long it takes to become uninfected.

By the way, the graph of the relationship between $ S $ and $ I $ at this time is as follows. It converges to the origin with the image going from the lower right to the left.

image.png

Interpretation

The green recovered number $ R $ is not very important here. The most important is the number of people infected with red $ I $. In this case, from the initial state of $ I = 1 $, $ S $ in contact with the infected person gradually increased to $ I $, peaked one week later, and converged to 0 after about 14 to 15 days. You can see how it is going. You can see that it actually fits nicely like the white circle plot on reference slide p14.

Actual numerical simulation (2)

Now that we have calculated the numerical values for influenza, the next one is "Coronavirus". I did not explain about influenza, but I will explain from "basic reproduction number $ R_0 $" which is an important parameter when treating infectious diseases as a subject.

Basic reproduction number

The definition is "the expected number of secondary infections produced by one infected person during the entire infection period." For example, if $ R_0 = 2 , one person will infect two people on average. I won't go into details, but it is generally known that the epidemic will end if $ R_0 <1 $$. You can get it by calculating with $ right side = 0 $ in the second formula of the SIR model.

The following URL is quoted for this $ R_0 $. https://wired.jp/2020/01/26/scientists-predict-wuhans-outbreak-will-get-much-worse/

A major uncertain factor is the infectivity of 2019-nCoV. In fact, how infectious is it? In Reed's model, the number of people that one infected person can infect (basic reproduction of the virus) is estimated to be 3.6-4.0. By comparison, SARS (Severe Acute Respiratory Syndrome) is 2-5, and measles, which is the most infectious to humans, is 12-18.

Let's calculate this coronavirus with the basic reproduction number $ R_0 = 3.4 $. Although the calculation is omitted, the infection rate $ \ beta $ can be calculated by the following formula.

\beta=\frac{\gamma R_0}{S(0)}

Now consider the initial conditions. Wuhan, where the coronavirus is widespread, is a big city in China with 11.08 million people. $(S,I,R)=(11079999,1,0)$ I will calculate as.

Then you will get $ \ beta = 2.19 \ times10 ^ {-8} $. Here, I set $ \ gamma = 0.071428 $ (1/14 is equivalent to 2 weeks). $ S (0) $ is 11,079,999.

result

The source code is basically the same as for influenza. Only the parameters have been changed.

image.png

Interpretation

The initial conditions assume the time when the first infected person is found. Realistically, you can think of it as December 1, 2019. It's just the beginning of February, so if you think that about two months have passed, it's 60 days. It's really scary when I think about it. .. .. This time, I used a fairly large number of R = 3.4, so I assume the worst case. In fact, according to the WHO (World Health Organization), it is $ R_0 $ = 2.2 ~ 2.8, so it should converge sooner. The key point is when to find the first infected person. Assuming that this is the peak of infected people (around 100 days in the graph), it is expected that it will take about 40 days for the number of infected people to converge to 0 from here. As many of you may have noticed, this model does not consider the dead. Therefore, the S + I + R system is preserved. If you want to think of a more realistic model, you can use the SEIR model.

Summary

This post

I posted to Qiita for the first time. This time, we numerically calculated the SIR model, which is the most basic of the "infection models". The parameters actually used were learned from past cases, but more advanced mathematics is required to perform parameter estimation in detail. If I have time, I will write such an article in the future. As an individual, I am interested in Machine Learning and competitive programming, so I am thinking of making a repository that includes memorandums there.

Outlook

Reference document

Recommended Posts

(First post) A story about numerical calculation of influenza and new pneumonia coronavirus with Tensorflow
Build a numerical calculation environment with pyenv and miniconda3
A story about predicting prefectures from the names of cities, wards, towns and villages with Jubatus
I replaced the numerical calculation of Python with Rust and compared the speed
A story about automating online mahjong (Mahjong Soul) with OpenCV and machine learning
The story of making a sound camera with Touch Designer and ReSpeaker
Create a new Python numerical calculation project
A story about calculating the speed of a small ball falling while receiving air resistance with Python and Sympy
A story about machine learning with Kyasuket
A story about Python pop and append
A story about getting the Atom field (XML telegram) of the Japan Meteorological Agency with Raspberry Pi and tweeting it
A story about Go's global variables and scope
A story about implementing a login screen with django
A story about modifying Python and adding functions
A story about an error when loading a TensorFlow model created with Google Colab locally