American statistician Nate Silver famous for the election prediction of the US presidential election % E3% 83% BB% E3% 82% B7% E3% 83% AB% E3% 83% 90% E3% 83% BC) on a site called FiveThirtyEight , "Coronavirus Case Counts Are Meaningless" I will.
The claim in this article is ** "Unless you know enough about how the test is done, the reported number of people infected with COVID-19 is not a useful indicator [^ 1]." * It's called *. While I intuitively think "that's right", virus infection is an exponential phenomenon, so even if the number of tests is narrowed down (possibly ** arbitrarily **), the reproduction ratio I also see and hear claims that (explained later) should be observable. From a completely different point of view, there are claims that medical care will collapse if the tests are not narrowed down. Nate's article includes a simple simulation that readers can try out as an Excel Spreadsheet (https://fivethirtyeight.com/wp-content/uploads/2020/04/covidia_1.05_updated.xlsx). So that you can experience how the number of infected people is reported and the true number of infected people is different depending on the test policy (selection of test targets and total number of tests) and infection control (social distancing and lockdown). It has become. You can also observe the difference between the true reproduction rate and the observed apparent reproduction rate.
The spreadsheet model [^ 2] is convenient to easily play with the parameters and see the difference in the results, but each cell in the spreadsheet says $ \ text {BH32} $ "column" It is represented by a combination of "= alphabet" and "line = number", and their relationship is written in a BASIC-like formula, so it is not very suitable for understanding what is calculated and how. ..
[^ 2]: Nate Silver didn't read the spreadsheet as a model. It's not meant to predict anything.
So I converted the spreadsheet model into a Python program (manually) to understand what I was doing. In addition, I gave a parameter to the current situation in Japan (April 22, 2020) and tried it. Then, although the number of reported infections has just exceeded 10,000, the actual number of cases actually exceeds ** 7.5 million , which means that about 6% of the total population is infected. I got it. With this, you should think, " If you go outside, the next person is infected **".
You can find the jupyter notebook on github.
If you think of the infection phenomenon as a phenomenon that occurs continuously, it becomes a differential equation and it is difficult for an amateur to handle it, so it is regarded as a very rough discrete phenomenon. In other words, consider the chain of infection as a chain from the previous generation [^ 3] to the next generation, and further simplify the calculation by fixing the time difference between generations to a certain constant (for example, 5 days).
The most important parameter is the reproduction rate ($ R $), which specifies how many $ n + 1 $ infected individuals in the $ n $ generation will pass the disease. .. This depends on the infectivity of the disease, but also on the behavior of people in society. The reproduction rate of cruise ships would have been very high, and the reproduction rate of Wuhan would have been low since it was locked down and all outings were banned. If $ R $ is greater than 1, the infection will spread, and if it is less than 1, the disease will eventually disappear.
[^ 3]: Of course, this generation is not the baby boomer generation or the millennial generation, but the first infected people are called the 0th generation, the people infected from there are called the 1st generation, and so on.
Parameters related to disease spread are expressed by Disease_spread_params
class.
R_uncontrolled
: Reproduction rate when no action is taken.R_intermediate
: Intermediate reproduction rates that people have become more careful about, such as social distance, hand-washing practices, and wearing masks.R_lockdown
: Reproduction rate when lockdown is enforced.cluster
: Parameters to adjust when the proportion of infected people in the population increases (0 = no adjust, 0.5: slightly, 1: moderately, 2: significantly)begin_internmediate
: Generation number that enters the intermediate stage after the no-measure period endsbegin_lockdown
: Generation number where lockdown beginsmild
: Rate of infection and moderate disease1-mild-asymptomatic
)Population parameters are represented by Population_params
class.
population
: Total population of this countryfaux_severe
: Percentage of critically ill patients with COVID-19-like causes other than coronavirusfaux_mild
: Percentage of moderately ill patients with COVID-19-like causes other than coronavirusdesire_severe
: Percentage of critically ill patients seeking PCR testingdesire_mild
: Percentage of moderately ill patients seeking a PCR testdesire_asymptomatic
: Percentage of asymptomatic people seeking a PCR testParameters related to PCR testing are expressed by Testing_params
class.
ramp_period
: Generation number to start increasing the number of checkstest_growth_rate
: Rate of increase since starting to increase the number of teststest_max
: Maximum number of testsrationed_test_ratio
: Severe> Moderate> Percentage of tests ensured to adhere to asymptomatic prioritiesfalse_negative
: Percentage of false negative PCR testsfalse_positive
: Percentage of false positives in PCR testsdelay
: Delay from inspection to result (number of generations)The values of these parameters are given as default values by selecting plausible ones from various papers.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time, re
import datetime
iround = lambda x: int((x * 2 + 1) // 2)
class Disease_spread_params:
def __init__(self,
R_uncontrolled = 2.7,
R_intermediate = 1.4,
R_lockdown = 0.7,
cluster = 0.5,
begin_intermediate = 11,
begin_lockdown = 15,
mild = 0.6,
asymptomatic = 0.3,
zero_date = '2020-01-01',
serial = 5):
self.R_uncontrolled = R_uncontrolled
self.R_intermediate = R_intermediate
self.R_lockdown = R_lockdown
self.cluster = cluster
self.begin_intermediate = begin_intermediate
self.begin_lockdown = begin_lockdown
self.severe = 1 - mild - asymptomatic
self.mild = mild
self.asymptomatic = asymptomatic
self.zero_date = zero_date
self.serial = serial
class Population_params:
def __init__(self,
population = 10_000_000,
initial_infection = 1,
faux_severe = 0.001,
faux_mild = 0.025,
desire_severe = 1.0,
desire_mild = 0.5,
desire_asymptomatic = 0.02):
self.population = population
self.initial_infection = initial_infection
self.faux_severe = faux_severe
self.faux_mild = faux_mild
self.desire_severe = desire_severe
self.desire_mild = desire_mild
self.desire_asymptomatic = desire_asymptomatic
class Testing_params:
def __init__(self,
initial_tests = 1_000,
ramp_period = 3,
test_growth_rate = 0.5,
test_max = 10_000_000,
rationed_test_ratio = 0.75,
false_negative = 0.20,
false_positive = 0.002,
delay = 2):
self.initial_tests = initial_tests
self.ramp_period = ramp_period
self.test_growth_rate = test_growth_rate
self.test_max = test_max
self.rationed_test_ratio = rationed_test_ratio
self.false_negative = false_negative
self.false_positive = false_positive
self.delay = delay
Here, as a function to round a real number to an integer, I am using my own function called ʻiround ()instead of the Python built-in
round (). This is for compatibility with Excel's
ROUND` function [^ 4].
[^ 4]: Python's round ()
is rounded to an even number, while EXCEL's round ()
is rounded.
While looking at the cell-by-cell dependencies of the spreadsheet I referred to, I understood how it was calculated and reimplemented it. To be honest, it's dirty (not humble).
The Simulator
class takes a parameter, instantiates it, calls therun ()
method, and returns the result as pandas.DataFrame
.
class Simulator:
def __init__(self,
disease_spread_params,
population_params,
testing_params):
self.disease_spread_params = disease_spread_params
self.population_params = population_params
self.testing_params = testing_params
self.columns = [ 'date', 'actual.R', 'doubling time in days' ]
self.date_regex = re.compile('(\d+)-(\d+)-(\d+)')
def decode_date(self, date):
match = self.date_regex.fullmatch(date)
if match:
y, m, d = match.group(1, 2, 3)
timestamp = time.mktime((int(y), int(m), int(d), 0, 0, 0, -1, -1, -1))
return timestamp
return None
def encode_date(self, timestamp):
t = time.localtime(timestamp)
return '{0:04d}-{1:02d}-{2:02d}'.format(t[0], t[1], t[2])
def run(self, zero_day = '2020-01-01', generations = 40):
self.generations = generations
self.df = pd.DataFrame(index = range(0, self.generations))
t0 = self.decode_date(zero_day)
self.df['date'] = [ self.encode_date(t0 + d * self.disease_spread_params.serial * 24 * 60 * 60) for d in range(0, self.generations) ]
self.set_target_R()
self.compute_actual_infection()
self.compute_tests()
return self.df
def set_target_R(self):
begin_lockdown = self.disease_spread_params.begin_lockdown
begin_intermediate = self.disease_spread_params.begin_intermediate
self.df['target_R'] = np.NaN
for i in range(0, self.generations):
if begin_lockdown != None and i >= begin_lockdown:
self.df.at[i, 'target_R'] = self.disease_spread_params.R_lockdown
elif begin_intermediate != None and i >= begin_intermediate:
self.df.at[i, 'target_R'] = self.disease_spread_params.R_intermediate
else:
self.df.at[i, 'target_R'] = self.disease_spread_params.R_uncontrolled
def compute_actual_infection(self):
population = self.population_params.population
initial_infection = self.population_params.initial_infection
cluster = self.disease_spread_params.cluster
serial = self.disease_spread_params.serial
df = self.df
df['susceptible'] = np.NaN
df.at[0, 'susceptible'] = population - initial_infection
df['new_infection'] = np.NaN
df.at[0, 'new_infection'] = initial_infection
df['cumulative_infection'] = np.NaN
df.at[0, 'cumulative_infection'] = initial_infection
df['actual_R'] = np.NaN
df['actual_doubling_time'] = np.NaN
for i in range(1, self.generations):
df.at[i, 'new_infection'] = iround(df.at[i-1, 'susceptible']*(1-(1-((df.at[i-1, 'target_R']*(df.at[i-1, 'susceptible']/population)**cluster)/population))**df.at[i-1, 'new_infection']))
df.at[i, 'cumulative_infection'] = df.at[i-1, 'cumulative_infection'] + df.at[i, 'new_infection']
df.at[i, 'susceptible'] = population - df.at[i, 'cumulative_infection']
df.at[i-1, 'actual_R'] = df.at[i, 'new_infection'] / df.at[i-1, 'new_infection']
if df.at[i-1, 'cumulative_infection'] != 0:
df.at[i-1, 'actual_doubling_time'] = np.inf if df.at[i, 'new_infection'] == 0 else serial * np.log(2) / np.log(df.at[i, 'cumulative_infection']/df.at[i-1, 'cumulative_infection'])
def compute_tests(self):
population = self.population_params.population
ramp_period = self.testing_params.ramp_period
tests_max = self.testing_params.test_max
test_growth_rate = self.testing_params.test_growth_rate
rationed_test_ratio = self.testing_params.rationed_test_ratio
mild = self.disease_spread_params.mild
asymptomatic = self.disease_spread_params.asymptomatic
faux_severe = self.population_params.faux_severe
faux_mild = self.population_params.faux_mild
desire_severe = self.population_params.desire_severe
desire_mild = self.population_params.desire_mild
desire_asymptomatic = self.population_params.desire_asymptomatic
false_negative = self.testing_params.false_negative
false_positive = self.testing_params.false_positive
delay = self.testing_params.delay
serial = self.disease_spread_params.serial
cumulative_tests_conducted = 0
cumulative_detected_cases = 0
df = self.df
df['tests_available'] = 0
df['new_detected_cases'] = 0
df['cumulative_detected_cases'] = 0
for i in range(0, self.generations):
if i == 0:
df.at[i, 'tests_available'] = 0
elif i == 1:
df.at[i, 'tests_available'] = self.testing_params.initial_tests
elif i < ramp_period:
df.at[i, 'tests_available'] = df.at[i-1, 'tests_available']
else:
df.at[i, 'tests_available'] = iround(min(tests_max, df.at[i-1, 'tests_available'] * (1 + test_growth_rate)))
tests_available = df.at[i, 'tests_available']
rationed_tests = iround(tests_available * rationed_test_ratio)
on_demand_tests = tests_available - rationed_tests
new_infection_severe = iround(df.at[i, 'new_infection'] * (1 - mild - asymptomatic))
new_infection_mild = iround(df.at[i, 'new_infection'] * mild)
new_infection_asymptomatic = df.at[i, 'new_infection'] - new_infection_severe - new_infection_mild
population_severe = iround((population - df.at[i, 'new_infection']) * faux_severe) + new_infection_severe
population_mild = iround((population - df.at[i, 'new_infection']) * faux_mild) + new_infection_mild
population_asymptomatic = population - population_severe - population_mild
desiring_tests_severe = iround(population_severe * desire_severe * (1 - cumulative_tests_conducted/population))
desiring_tests_mild = iround(population_mild * desire_mild * (1 - cumulative_tests_conducted/population))
desiring_tests_asymptomatic = iround(population_asymptomatic * desire_asymptomatic * (1 - cumulative_tests_conducted/population))
alloc_rationed_tests_severe = min(rationed_tests, desiring_tests_severe)
alloc_rationed_tests_mild = min(desiring_tests_mild, rationed_tests-alloc_rationed_tests_severe)
alloc_rationed_tests_asymptomatic = min(desiring_tests_asymptomatic, rationed_tests-alloc_rationed_tests_severe-alloc_rationed_tests_mild)
unfilled_test_demand_severe = desiring_tests_severe - alloc_rationed_tests_severe
unfilled_test_demand_mild = desiring_tests_mild - alloc_rationed_tests_mild
unfilled_test_demand_asymptomatic = desiring_tests_asymptomatic - alloc_rationed_tests_asymptomatic
unfilled_test_demand = unfilled_test_demand_severe + unfilled_test_demand_mild + unfilled_test_demand_asymptomatic
alloc_on_demand_tests_severe = 0 if unfilled_test_demand == 0 else iround(on_demand_tests * unfilled_test_demand_severe / unfilled_test_demand)
alloc_on_demand_tests_mild = 0 if unfilled_test_demand == 0 else iround(on_demand_tests * unfilled_test_demand_mild / unfilled_test_demand)
alloc_on_demand_tests_asymptomatic = 0 if unfilled_test_demand == 0 else iround(on_demand_tests * unfilled_test_demand_asymptomatic / unfilled_test_demand)
tests_conducted_severe = alloc_rationed_tests_severe + alloc_on_demand_tests_severe
tests_conducted_mild = alloc_rationed_tests_mild + alloc_on_demand_tests_mild
tests_conducted_asymptomatic = alloc_rationed_tests_asymptomatic + alloc_on_demand_tests_asymptomatic
df.at[i, 'tests_conducted_severe'] = tests_conducted_severe
df.at[i, 'tests_conducted_mild'] = tests_conducted_mild
df.at[i, 'tests_conducted_asymptomatic'] = tests_conducted_asymptomatic
tests_conducted = tests_conducted_severe + tests_conducted_mild + tests_conducted_asymptomatic
cumulative_tests_conducted += tests_conducted
positive_tests_severe = iround(tests_conducted_severe * new_infection_severe / population_severe * (1 - false_negative)) + \
iround(tests_conducted_severe * (1 - new_infection_severe / population_severe) * false_positive)
positive_tests_mild = iround(tests_conducted_mild * new_infection_mild / population_mild * (1 - false_negative)) + \
iround(tests_conducted_mild * (1 - new_infection_mild / population_mild) * false_positive)
positive_tests_asymptomatic = iround(tests_conducted_asymptomatic * new_infection_asymptomatic / population_asymptomatic * (1 - false_negative)) + \
iround(tests_conducted_asymptomatic * (1 - new_infection_asymptomatic / population_asymptomatic) * false_positive)
if i+delay < self.generations:
df.at[i+delay, 'new_detected_cases'] = positive_tests_severe + positive_tests_mild + positive_tests_asymptomatic
cumulative_detected_cases += df.at[i, 'new_detected_cases']
df.at[i, 'cumulative_detected_cases'] = cumulative_detected_cases
if i > 0 and df.at[i-1, 'new_detected_cases'] > 0:
df.at[i-1, 'observed_R'] = df.at[i, 'new_detected_cases'] / df.at[i-1, 'new_detected_cases']
df.at[i-1, 'observed_doubling_time'] = np.inf if df.at[i, 'new_detected_cases'] == 0 else serial * np.log(2) / np.log(df.at[i, 'cumulative_detected_cases']/df.at[i-1, 'cumulative_detected_cases'])
Nate Silver's article features a simulation of the virtual country Covidia. The graph below is a visualization of the results. Covidia has a population of 10 million and will begin to spread with the arrival of the first infected person on January 1, 2020. We assume an initial reproduction rate of $ 2.7 $, an intermediate stage of $ 1.4 $, and a lockdown of $ 0.7 $.
The blue line shows the number of true new infections, and the yellow line shows the number of new infections reported by the test. In this scenario, the initial test capacity is small at 1,000 per generation, but by increasing the number of tests indefinitely, it seems that the number of reports is increasing following the true number of infected people, although there is a time lag. ..
However, if you look closely, you can see that in the phase where the number of infected people is increasing, it is underestimated by 20 times or more, and when it tends to decrease, it is overestimated.
Japan has been criticized for keeping the number of PCR tests low. The Ministry of Health, Labor and Welfare reports daily [^ 5], but it is said that the number of inspections is still not keeping up with demand. Therefore, I tried what would happen if the number of PCR tests was set low for the Japanese population of 126 million. By the way, I also plotted the number of positive confirmed persons that have been actually reported so far. Whereas the graph above shows the number of newly infected people, this is the cumulative number of infected people [^ 6].
[^ 5]: For example, in Latest Report, 7826 new PCR tests were performed from the previous day's report. You can see that. Unlike the previous figure, it is a plot of the cumulative number of infected people. [^ 6]: Actual data is given cumulatively, and I thought that was more familiar to me.
![fig2.png] (https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/223481/193b7bce-f37f-4b9e-727b-223c95b838f5.png)
Here, the green line is the actual number of infected people in Japan [^ 6].
[^ 6]: I am using the data published by John Hopkins University on github.
I do not intend to predict anything with this simulation, but first I would like you to see the magnitude of the difference between the true value (blue line) and the simulation result and the reported value (green line). The following can be observed.
But what if we had increased our testing capacity early on and had unlimited testing? The following graph shows the results when the initial value of the inspection capacity is set to 10,000 and it is gradually but unlimitedly increased.
Even in this case, it is still not possible to capture the true number of infected people, but we were able to observe about 10 times the current number of reporters. In addition, although the rise in the number of reporters (yellow) is a little slow, the slope is close to the slope of the true number of infected people, so the difference between the two can be regarded as the difference in the left-right direction (time delay), which is correct. It seems that the production rate of $ R $ can be observed.
I think it is true that there is no difference in terms of treatment because there are no hospitals or quarantine facilities where the infected person is known. But I think that people's daily behavior may change a little if they find out that the true number of infected people is 10 times higher than reported, and in some cases 100 times higher.
I wonder if the infectious disease experts on TV are making a much more detailed version of this model to predict the future.
I don't know.
Recommended Posts