[PYTHON] Quantify the degree of self-restraint required to contain the new coronavirus

Introduction

The new coronavirus infection (COVID-19) is rampant all over the world, and urban blockades, so-called lockdowns, are being carried out in various regions such as China, Europe, the United States, and Asia. In Japan as well, as of April 4, there are 3360 infected people in Japan and 891 people in Tokyo, and the governor is requesting refraining from going out on weekends and avoiding specific food service businesses. In addition, it is spreading not only in central Tokyo and Osaka, but throughout the country, and it can be said that the situation is as close as possible to an emergency situation where an overshoot occurs. In order to avoid such an emergency, in Government Expert Meeting etc., 3 dense spaces (closed and dense) ・ We are actively calling to avoid (close). Then, how much should we avoid such three dense spaces? In this article, I would like to quantify ** the degree of avoidance of three dense spaces ** necessary to prevent the spread of infection ** as a numerical value.

3 dense space and basic reproduction number

The basic reproduction number is

Is defined as. In other words, it is the strength with which an infected person can infect other people with the new coronavirus. Also, if its strength is strong, it can infect many people at once, resulting in clusters. According to the National Cluster Map published by the Ministry of Health, Labor and Welfare, the following are listed as the elements that make up the cluster.

What they have in common is probably the three dense spaces. In addition, the cluster countermeasure group estimates show the analysis results of basic reproduction numbers by environment.

Q14.jpg

This figure suggests that the frequency distribution of basic reproduction numbers differs depending on whether the source of infection is in a poorly ventilated environment or in other environments. In other words, ** 3 the probability distribution of the basic reproduction number changes greatly depending on whether or not it is a dense space **. Therefore, I would like to define the parameters that represent the differences in the environment as follows.

Then, the code to approximate the basic reproduction number by the above parameter er is shown in Previous article. The above figure is approximated by combining conditional branching and uniform distribution.

"""
 er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
    if np.random.rand() < er:
        # good environment
        if np.random.rand() < 0.8:
            R0 = 0
        else:
            R0 = np.random.randint(1,4)*1.0
    else:
        # bad environment
        R0 = np.random.randint(0,12)*1.0
    return R0

Then, what value of the environmental parameter er can reduce the basic reproduction number to less than 1 (in the direction of converging infection)? Below, I would like to consider using statistical tools.

Central limit theorem

What is the Central Limit Theorem?

  1. Observations $ X_1, X_2, \ cdots, X_n $ are random variables that are independent of each other and all follow the same distribution.
  2. For all i, $ E [X_i] = \ mu, V [X_i] = \ sigma $.

When, for the sample mean $ \ bar {X} _n $ of the observed values,

\frac{\bar{X}_n - \mu}{\left( \frac{\sigma}{\sqrt{n}} \right)} \sim N(0,1)\  ({\rm as}\ n \rightarrow \infty) \\
\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i

Was to hold. What is important here is that there are no assumptions about the probability distribution of the population other than the existence of true values for the mean and variance. Therefore, this theorem can be applied to the probability distribution of the basic reproduction number.

Confidence interval estimation of population mean

So how do you estimate what the population mean of basic reproduction numbers will be for each environmental parameter er? According to statistics textbooks, an estimate of the population mean (confidence interval of $ 1- \ alpha $) when the population variance is unknown is given below.

\left[ \bar{X}_n - \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1),  \bar{X}_n + \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1) \right]\\
U^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2

Here, $ U $ is the unbiased sample variance, and $ t_ {\ alpha} (n-1) $ is the $ \ alpha / 2 $ point of the t distribution with $ n-1 $ degrees of freedom (the tail area of the probability distribution is $). The value of the random variable that is \ alpha / 2 $). Reading the textbook, this confidence interval seems to be derived from the combination of estimates based on the central limit theorem, with the population mean following a normal distribution and the population variance following a $ χ ^ 2 $ distribution.

Try to calculate with Python

Now, let's use Python to calculate the population mean and confidence interval of the basic reproduction number R0 depending on the environment parameter er.

First, import the library.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

This is the function for generating the basic reproduction number.

"""
 er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
    if np.random.rand() < er:
        # good environment
        if np.random.rand() < 0.8:
            R0 = 0
        else:
            R0 = np.random.randint(1,4)*1.0
    else:
        # bad environment
        R0 = np.random.randint(0,12)*1.0
    return R0

A function that calculates the sample mean and unbiased variance by sampling N pieces according to the given probability distribution $ f $.

def EstFromNsample(f, N):
    #N samplings according to probability distribution f
    sample = [f() for i in range(N)]
    #Specimen average
    Eave = np.average(sample)
    #Unbiased sample variance
    Esig2 = 1./(np.float(N)-1.) * np.sum([pow(s - Eave, 2) for s in sample]) 
    return Eave, Esig2

This function calculates the estimated values of population mean and population variance by engraving the environment parameter er from 0 to 1 with a mesh and sampling each of them.

def makeERsamples(keys):
    ediv = keys['ediv']
    N = keys['N']
    #
    l = []
    #
    for e in np.linspace(0, 1, ediv):
        f = lambda : COVID19R0(e) #Sampling function in a specific environment parameter
        Eave, Esig2 = EstFromNsample(f, N)
        l.append([e, Eave, Esig2])
    return np.array(l)

A function that calculates the confidence interval. Note that alpha has the opposite meaning of $ \ alpha $ above. According to scipy.stats, "Endpoints of the range that contains alpha percent of the distribution" , T_dist.interval is a function that returns the endpoints of the interval containing the alpha percent of the t distribution.

def calcBand01(ave, sigma, N, alpha):
    t_dist = stats.t(loc=0,scale=1,df=N-1)
    b, u = t_dist.interval(alpha=alpha)
    b = ave + b * sigma / np.sqrt(N)
    u = ave + u * sigma / np.sqrt(N)
    return b, u

A function that displays the population mean estimate and its confidence interval.

def showBandEstimation(sample, keys):
    #
    alpha = keys['alpha']
    N = keys['N']
    #
    x = sample[:,0]
    y = sample[:,1]
    s2 = sample[:,2]
    s = np.sqrt(s2)
    #
    yb, yu = calcBand01(y, s, N, alpha)
    #
    fig, ax = plt.subplots(1,1,figsize=(5,4), dpi=200)
    ax.plot([0,1],[1,1], 'r--')
    ax.plot(x, y, label='ave')
    ax.plot(x, yu, '.', label='upper')
    ax.plot(x, yb, '.', label='lower')
    ax.fill_between(x, yu, yb, facecolor='r',alpha=0.2)
    ax.set_xlabel('er')
    ax.set_ylabel('R0')
    ax.set_xticks(np.linspace(0,1,11))
    ax.grid(b=True, which='both', axis='x')
    ax.legend()
    plt.show()

Finally, the part that performs sampling and displays the estimated population mean and confidence intervals.

keys = {'N':100, 'ediv':100, 'alpha':0.95 }

sample = makeERsamples(keys)
fig = showBandEstimation(sample, keys)

Calculation result

Now let's take a look at the calculation results.

For sampling number (N = 100)

R0ave_BandEstimation_N100.png

For sampling number (N = 1000)

R0ave_BandEstimation_N1000.png

When the number of samplings (N = 10000)

R0ave_BandEstimation_N10000.png

There is considerable variability at $ N = 100 $, but at $ N = 10000 $ it looks like it's almost converged. Therefore, it seems that a fairly reliable estimate can be made with about 10,000 samples. The important point is that this estimation is 1. 2. The sample mean converges to a normal distribution by the central limit theorem. It means that the confidence interval is estimated by the t distribution, which is a two-step approach. So it's important to determine how much $ N $ is enough to make this estimate.

Consideration

From the above, the following tendency can be derived from the simulation regarding the estimation of the population mean of the basic reproduction number R0 for the environmental parameter er (how to avoid 3 dense spaces).

Furthermore ...

Reference link

I referred to the following page.

Recommended Posts

Quantify the degree of self-restraint required to contain the new coronavirus
Plot the spread of the new coronavirus
Estimate the peak infectivity of the new coronavirus
Did the number of store closures increase due to the impact of the new coronavirus?
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to predict the behavior of the new coronavirus with the SEIR model.
Folding @ Home on Linux Mint to contribute to the analysis of the new coronavirus
Factfulness of the new coronavirus seen in Splunk
GUI simulation of the new coronavirus (SEIR model)
I tried to automatically send the literature of the new coronavirus to LINE with Python
Let's test the medical collapse hypothesis of the new coronavirus
The theory that the key to controlling infection with the new coronavirus is hyperdispersion of susceptibility
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
[Horse Racing] I tried to quantify the strength of racehorses
Use hash to lighten collision detection of about 1000 balls in Python (related to the new coronavirus)
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
Analyzing the age-specific severity of coronavirus
Supplement to the explanation of vscode
Create a bot that posts the number of people positive for the new coronavirus in Tokyo to Slack
Let's calculate the transition of the basic reproduction number of the new coronavirus by prefecture
The story of trying to reconnect the client
Script to change the description of fasta
10 methods to improve the accuracy of BERT
How to check the version of Django
The story of adding MeCab to ubuntu 16.04
(Now) I analyzed the new coronavirus (COVID-19)
The story of pep8 changing to pycodestyle
Convert PDF of product list containing effective surfactants for new coronavirus to CSV
Scraping the number of downloads and positive registrations of the new coronavirus contact confirmation app
The epidemic forecast of the new coronavirus was released on the Web at explosive speed
I tried to display the infection condition of coronavirus on the heat map of seaborn