Introduction This is an introductory article about the Python package CovsirPhy that allows you to easily download and analyze COVID-19 data (such as the number of PCR positives).
** This time, I will explain the scenario analysis (parameter comparison). ** **
The English version of the document is Covsir Phy: COVID-19 analysis with phase-dependent SIRs, [Kaggle: COVID-19 data with SIR model]( Please refer to https://www.kaggle.com/lisphilar/covid-19-data-with-sir-model).
CovsirPhy can be installed by the following method! Please use Python 3.7 or above, or Google Colaboratory.
--Stable version: pip install covsirphy --upgrade
--Development version: pip install" git + https://github.com/lisphilar/covid19-sir.git#egg=covsirphy "
import covsirphy as cs
cs.__version__
# '2.8.3'
Execution environment | |
---|---|
OS | Windows Subsystem for Linux / Ubuntu |
Python | version 3.8.5 |
Click here for the code [^ 2] to download the actual data from COVID-19 Data Hub [^ 1]:
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
In the table and graph of this article, the data of Japan as of October 3, 2020 (national total value announced by the Ministry of Health, Labor and Welfare) is used by the following code [^ 3]. COVID-19 Data Hub [^ 1] also contains Japanese data, but it seems to be different from the national total value announced by the Ministry of Health, Labor and Welfare.
[^ 3]: [CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
# (Optional)Acquisition of data from the Ministry of Health, Labor and Welfare
#If you get an error"pip install requests aiohttp"
japan_data = data_loader.japan()
jhu_data.replace(japan_data)
print(japan_data.citation)
Confirmation of actual data:
#Instantiation of analysis class
snl = cs.Scenario(jhu_data, population_data, country="Japan")
#Graph display of actual data
snl.records(filename=None)
Division into Phase by S-R trend analysis [^ 3] and period adjustment [^ 4]:
[^ 4]: [CovsirPhy] COVID-19 Python package for data analysis: Phase setting optimization
# S-R trend analysis
snl.trend(filename=None)
# Separate 0th phase
snl.separate("01Apr2020")
#List display
snl.summary()
Type | Start | End | Population | |
---|---|---|---|---|
0th | Past | 06Feb2020 | 31Mar2020 | 126529100 |
1st | Past | 01Apr2020 | 21Apr2020 | 126529100 |
2nd | Past | 22Apr2020 | 06Jul2020 | 126529100 |
3rd | Past | 07Jul2020 | 23Jul2020 | 126529100 |
4th | Past | 24Jul2020 | 01Aug2020 | 126529100 |
5th | Past | 02Aug2020 | 14Aug2020 | 126529100 |
6th | Past | 15Aug2020 | 28Aug2020 | 126529100 |
7th | Past | 29Aug2020 | 08Sep2020 | 126529100 |
8th | Past | 09Sep2020 | 19Sep2020 | 126529100 |
9th | Past | 20Sep2020 | 03Oct2020 | 126529100 |
Parameter estimation with SIR-F model[^5]:
[^ 5]: [CovsirPhy] COVID-19 Python package for data analysis: Parameter estimation
# Parameter estimation
snl.estimate(cs.SIRF)
#List display (part)
est_cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS, "RMSLE"]
snl.summary(columns=est_cols)
Start | End | Rt | theta | kappa | rho | sigma | RMSLE | |
---|---|---|---|---|---|---|---|---|
0th | 06Feb2020 | 31Mar2020 | 3.6 | 0.0197967 | 0.000781433 | 0.110404 | 0.0292893 | 0.693445 |
1st | 01Apr2020 | 21Apr2020 | 13.53 | 0.00119307 | 0.000954275 | 0.119908 | 0.00789786 | 0.17892 |
2nd | 22Apr2020 | 06Jul2020 | 0.37 | 0.090172 | 0.000858737 | 0.0264027 | 0.0636349 | 0.883253 |
3rd | 07Jul2020 | 23Jul2020 | 1.93 | 0.000466376 | 8.04707e-05 | 0.133382 | 0.0689537 | 0.0318977 |
4th | 24Jul2020 | 01Aug2020 | 1.78 | 7.63782e-05 | 0.000231109 | 0.135991 | 0.0760643 | 0.0193947 |
5th | 02Aug2020 | 14Aug2020 | 1.4 | 0.00100709 | 0.000215838 | 0.100929 | 0.0716643 | 0.0797834 |
6th | 15Aug2020 | 28Aug2020 | 0.82 | 0.000562179 | 0.000899694 | 0.0783473 | 0.0943161 | 0.021244 |
7th | 29Aug2020 | 08Sep2020 | 0.7 | 0.00168729 | 0.00119958 | 0.0616905 | 0.0863032 | 0.0182731 |
8th | 09Sep2020 | 19Sep2020 | 0.78 | 0.00293238 | 0.00132534 | 0.0793277 | 0.099966 | 0.0282502 |
9th | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 | 0.0274187 |
In Covsirphy's Scenario
class, multiple types of Phase [^ 3] settings can be registered. "Phase [^ 3] settings" is called "scenario".
For example, "Main scenario" and "Medicine scenario" can be set in parallel and scenario analysis can be performed as shown in the following table. The dates are listed on the assumption that the actual data from 2020/2/6 to 2020/10/3 are available.
scenario name | 0th-9th | 10th | 11th |
---|---|---|---|
Main | 2/6 - 10/3 | 10/4 - 12/31 | 2021/1/1 - 2020/4/10 |
Medicine | 2/6 - 10/3 | 10/4 - 12/31 | 2021/1/1 - 2020/4/10 |
Estimate the parameters of the 0th-9th phase for each scenario, and simulate the number of patients assuming that the parameters of the 10th phase are the same and that $ \ sigma $ doubles between the 10th and 11th for the Medicine scenario. .. At this time, the doubling effect of $ \ sigma $ can be estimated by comparing the changes in the number of patients between 2021 / 1/1-2021 / 4/10.
It is unlikely that it will double at once due to the creation of new drugs, but please refer to it as a demonstration.
To examine the effect of parameters on the number of patients, create scenarios with different parameters, simulate the number of patients, and compare them. First, how to create / initialize / delete the scenario itself is as follows.
You can use the Scenario.clear (name =" new scenario name ", template =" source scenario name ")
method to copy a configured scenario and create a new one. When creating a "New" scenario with the Main scenario (default) as the copy source
#Create New
snl.clear(name="New", template="Main")
#List display
cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS]
snl.summary(columns=cols)
(1st-8th phase omitted on paper)
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
('Main', '0th') | 06Feb2020 | 31Mar2020 | 3.85 | 0.0185683 | 0.000768779 | 0.112377 | 0.027892 |
('Main', '9th') | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
('New', '0th') | 06Feb2020 | 31Mar2020 | 3.85 | 0.0185683 | 0.000768779 | 0.112377 | 0.027892 |
('New', '9th') | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
Use Scenario.delete (name =" scenario name ")
to delete. Delete the "New" scenario and return only the "Main" scenario to the registered state.
#Delete scenario
snl.delete(name="New")
#List display
snl.summary(columns=cols)
(1st-8th phase omitted on paper)
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
0th | 06Feb2020 | 31Mar2020 | 3.85 | 0.0185683 | 0.000768779 | 0.112377 | 0.027892 |
9th | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
From here, I will explain how to add a phase. You can add a new phase by specifying the last date, such as Scenario.add (end_date =" 31Dec2020 ")
.
# Main:Added up to 31Dec2020 as 10th phase
snl.add(end_date="31Dec2020", name="Main")
#List display
snl.summary(columns=cols, name="Main")
(1st-8th phase omitted on paper)
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
0th | 06Feb2020 | 31Mar2020 | 3.61 | 0.0190222 | 0.00099167 | 0.110766 | 0.0290761 |
9th | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
10th | 04Oct2020 | 31Dec2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
Since no parameters are set, the added 10th phase parameters have the same values as the 9th phase.
You can add a new phase by specifying the number of days, such as Scenario.add (days = 100)
.
# main: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(days=100, name="Main").summary(columns=cols)
#List display
snl.summary(columns=cols, name="Main")
(1st-8th phase omitted on paper)
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
0th | 06Feb2020 | 31Mar2020 | 3.61 | 0.0190222 | 0.00099167 | 0.110766 | 0.0290761 |
9th | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
10th | 04Oct2020 | 31Dec2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
11th | 01Jan2021 | 10Apr2021 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
Since no parameters have been set, the added 11th phase parameters have the same values as the 10th phase.
You can add a new phase by specifying a parameter value, such as Scenario.add (sigma = 0.18)
.
In this article, in order to estimate the doubling effect of $ \ sigma $, the estimated value of $ \ sigma $ in the Main scenario 11 phase (last phase) is obtained by the Scenario.get ()
method.
sigma_last = snl.get("sigma", phase="last", name="Main")
sigma_med = sigma_last * 2
print(round(sigma_last, 3), round(sigma_med, 3))
# -> 0.09 0.181
Create a Medicine scenario with the Main scenario as the copy source. At this time, future phases (10th, 11th phase) will be deleted. Then add the 10th phase (same parameters as Main) and 11th phase ($ \ sigma $ is doubled).
snl.clear(name="Medicine", template="Main")
snl.add(end_date="31Dec2020", name="Medicine")
# Medicine: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(sigma=sigma_med, days=100, name="Medicine")
#List display
snl.summary(columns=cols, name="Medicine")
(1st-8th phase omitted on paper)
Start | End | Rt | theta | kappa | rho | sigma | |
---|---|---|---|---|---|---|---|
0th | 06Feb2020 | 31Mar2020 | 3.85 | 0.0185683 | 0.000768779 | 0.112377 | 0.027892 |
9th | 20Sep2020 | 03Oct2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
10th | 04Oct2020 | 31Dec2020 | 0.87 | 0.000463924 | 0.000984493 | 0.0793406 | 0.090479 |
11th | 01Jan2021 | 10Apr2021 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 |
Since $ sigma $ has doubled, the effective reproduction number $ Rt $ has been halved.
As an aside, you can delete a phase with Scenario.delete (phase = [" Phase name "])
. Here, add and delete the 12th phase.
#Add Phase
snl.add(days=30, name="Medicine")
#Delete Phase
snl.delete(phases=["last"], name="Medicine")
Check the graph to see if the parameter transitions are set as planned. Use Scenario.history (parameter name)
. Use the scenario name as the series name.
Whether $ \ sigma $ has doubled in the 11th phase of the Medicine scenario (2021/1/1 --2021/4/1):
snl.history("sigma", filename=None)
Whether the effective reproduction number $ Rt $ is halved to the 11th phase (2021/1/1 --2021/4/1) of the Medicine scenario:
snl.history("Rt", filename=None)
You can also get the simulation result of the number of patients in Scenario.history (variable name)
.
snl.history("Infected", filename=None)
The number of infected people is rapidly decreasing due to the doubling of $ \ sigma $. It is unlikely that it will double at once due to the creation of new drugs, but by changing the parameter values to realistic values and performing a simulation, it is possible to estimate the number of infected people and the end time in the future.
If you want to get the value in dataframe format, use the Scenario.track ()
method.
snl.track().tail()
Scenario | Date | Confirmed | Fatal | Infected | Recovered | Population | Rt | theta | kappa | rho | sigma | 1/alpha2 [day] | 1/gamma [day] | alpha1 [-] | 1/beta [day] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
425 | Medicine | 2021-04-06 | 108463 | 1884 | 0 | 106579 | 126529100 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 | 1015 | 5 | 0 | 12 |
426 | Medicine | 2021-04-07 | 108463 | 1884 | 0 | 106579 | 126529100 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 | 1015 | 5 | 0 | 12 |
427 | Medicine | 2021-04-08 | 108463 | 1884 | 0 | 106579 | 126529100 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 | 1015 | 5 | 0 | 12 |
428 | Medicine | 2021-04-09 | 108463 | 1884 | 0 | 106579 | 126529100 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 | 1015 | 5 | 0 | 12 |
429 | Medicine | 2021-04-10 | 108463 | 1884 | 0 | 106579 | 126529100 | 0.44 | 0.000463924 | 0.000984493 | 0.0793406 | 0.180958 | 1015 | 5 | 0 | 12 |
Compare the characteristic values to understand the characteristics of the scenario. However, since it is meaningless to compare the average number of infected people, the characteristic values are set as follows.
--Maximum number of infected people and their date --Number of infected people on the day after the last day --Number of deaths on the day after the last day -(Considering addition) Integral value of the number of infected people
Click here for code and output results:
snl.describe()
max(Infected) | argmax(Infected) | Infected on 11Apr2021 | Fatal on 11Apr2021 | 11th_Rt | |
---|---|---|---|---|---|
Main | 15154 | 21Apr2020 | 512 | 1970 | 0.87 |
Medicine | 15154 | 21Apr2020 | 0 | 1884 | 0.44 |
The effective reproduction number $ Rt $ is output only for Phases whose values differ between scenarios.
If the current parameters continue until 2021/4/10, the predicted number of infected people will be around 512 even as of 4/11 ... On the other hand, if $ \ sigma $ doubles, infected people The predicted number is 0.
Thank you for browsing this time as well! We would appreciate it if you could give us your opinions, such as adding characteristic values.
Next time, I will explain how to evaluate the effectiveness of vaccines.
Recommended Posts