The new coronavirus infection (COVID-19) is rampant all over the world, and urban blockades, so-called lockdowns, are being carried out in various regions such as China, Europe, the United States, and Asia. In Japan as well, as of April 4, there are 3360 infected people in Japan and 891 people in Tokyo, and the governor is requesting refraining from going out on weekends and avoiding specific food service businesses. In addition, it is spreading not only in central Tokyo and Osaka, but throughout the country, and it can be said that the situation is as close as possible to an emergency situation where an overshoot occurs. In order to avoid such an emergency, in Government Expert Meeting etc., 3 dense spaces (closed and dense) ・ We are actively calling to avoid (close). Then, how much should we avoid such three dense spaces? In this article, I would like to quantify ** the degree of avoidance of three dense spaces ** necessary to prevent the spread of infection ** as a numerical value.
The basic reproduction number is
Is defined as. In other words, it is the strength with which an infected person can infect other people with the new coronavirus. Also, if its strength is strong, it can infect many people at once, resulting in clusters. According to the National Cluster Map published by the Ministry of Health, Labor and Welfare, the following are listed as the elements that make up the cluster.
What they have in common is probably the three dense spaces. In addition, the cluster countermeasure group estimates show the analysis results of basic reproduction numbers by environment.
This figure suggests that the frequency distribution of basic reproduction numbers differs depending on whether the source of infection is in a poorly ventilated environment or in other environments. In other words, ** 3 the probability distribution of the basic reproduction number changes greatly depending on whether or not it is a dense space **. Therefore, I would like to define the parameters that represent the differences in the environment as follows.
Then, the code to approximate the basic reproduction number by the above parameter er is shown in Previous article. The above figure is approximated by combining conditional branching and uniform distribution.
"""
er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
if np.random.rand() < er:
# good environment
if np.random.rand() < 0.8:
R0 = 0
else:
R0 = np.random.randint(1,4)*1.0
else:
# bad environment
R0 = np.random.randint(0,12)*1.0
return R0
Then, what value of the environmental parameter er can reduce the basic reproduction number to less than 1 (in the direction of converging infection)? Below, I would like to consider using statistical tools.
What is the Central Limit Theorem?
When, for the sample mean $ \ bar {X} _n $ of the observed values,
\frac{\bar{X}_n - \mu}{\left( \frac{\sigma}{\sqrt{n}} \right)} \sim N(0,1)\ ({\rm as}\ n \rightarrow \infty) \\
\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i
Was to hold. What is important here is that there are no assumptions about the probability distribution of the population other than the existence of true values for the mean and variance. Therefore, this theorem can be applied to the probability distribution of the basic reproduction number.
So how do you estimate what the population mean of basic reproduction numbers will be for each environmental parameter er? According to statistics textbooks, an estimate of the population mean (confidence interval of $ 1- \ alpha $) when the population variance is unknown is given below.
\left[ \bar{X}_n - \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1), \bar{X}_n + \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1) \right]\\
U^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2
Here, $ U $ is the unbiased sample variance, and $ t_ {\ alpha} (n-1) $ is the $ \ alpha / 2 $ point of the t distribution with $ n-1 $ degrees of freedom (the tail area of the probability distribution is $). The value of the random variable that is \ alpha / 2 $). Reading the textbook, this confidence interval seems to be derived from the combination of estimates based on the central limit theorem, with the population mean following a normal distribution and the population variance following a $ χ ^ 2 $ distribution.
Now, let's use Python to calculate the population mean and confidence interval of the basic reproduction number R0 depending on the environment parameter er.
First, import the library.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
This is the function for generating the basic reproduction number.
"""
er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
if np.random.rand() < er:
# good environment
if np.random.rand() < 0.8:
R0 = 0
else:
R0 = np.random.randint(1,4)*1.0
else:
# bad environment
R0 = np.random.randint(0,12)*1.0
return R0
A function that calculates the sample mean and unbiased variance by sampling N pieces according to the given probability distribution $ f $.
def EstFromNsample(f, N):
#N samplings according to probability distribution f
sample = [f() for i in range(N)]
#Specimen average
Eave = np.average(sample)
#Unbiased sample variance
Esig2 = 1./(np.float(N)-1.) * np.sum([pow(s - Eave, 2) for s in sample])
return Eave, Esig2
This function calculates the estimated values of population mean and population variance by engraving the environment parameter er from 0 to 1 with a mesh and sampling each of them.
def makeERsamples(keys):
ediv = keys['ediv']
N = keys['N']
#
l = []
#
for e in np.linspace(0, 1, ediv):
f = lambda : COVID19R0(e) #Sampling function in a specific environment parameter
Eave, Esig2 = EstFromNsample(f, N)
l.append([e, Eave, Esig2])
return np.array(l)
A function that calculates the confidence interval. Note that alpha has the opposite meaning of $ \ alpha $ above. According to scipy.stats, "Endpoints of the range that contains alpha percent of the distribution" , T_dist.interval is a function that returns the endpoints of the interval containing the alpha percent of the t distribution.
def calcBand01(ave, sigma, N, alpha):
t_dist = stats.t(loc=0,scale=1,df=N-1)
b, u = t_dist.interval(alpha=alpha)
b = ave + b * sigma / np.sqrt(N)
u = ave + u * sigma / np.sqrt(N)
return b, u
A function that displays the population mean estimate and its confidence interval.
def showBandEstimation(sample, keys):
#
alpha = keys['alpha']
N = keys['N']
#
x = sample[:,0]
y = sample[:,1]
s2 = sample[:,2]
s = np.sqrt(s2)
#
yb, yu = calcBand01(y, s, N, alpha)
#
fig, ax = plt.subplots(1,1,figsize=(5,4), dpi=200)
ax.plot([0,1],[1,1], 'r--')
ax.plot(x, y, label='ave')
ax.plot(x, yu, '.', label='upper')
ax.plot(x, yb, '.', label='lower')
ax.fill_between(x, yu, yb, facecolor='r',alpha=0.2)
ax.set_xlabel('er')
ax.set_ylabel('R0')
ax.set_xticks(np.linspace(0,1,11))
ax.grid(b=True, which='both', axis='x')
ax.legend()
plt.show()
Finally, the part that performs sampling and displays the estimated population mean and confidence intervals.
keys = {'N':100, 'ediv':100, 'alpha':0.95 }
sample = makeERsamples(keys)
fig = showBandEstimation(sample, keys)
Now let's take a look at the calculation results.
There is considerable variability at $ N = 100 $, but at $ N = 10000 $ it looks like it's almost converged. Therefore, it seems that a fairly reliable estimate can be made with about 10,000 samples. The important point is that this estimation is 1. 2. The sample mean converges to a normal distribution by the central limit theorem. It means that the confidence interval is estimated by the t distribution, which is a two-step approach. So it's important to determine how much $ N $ is enough to make this estimate.
From the above, the following tendency can be derived from the simulation regarding the estimation of the population mean of the basic reproduction number R0 for the environmental parameter er (how to avoid 3 dense spaces).
I referred to the following page.