[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)

Introduction This is an introductory article about the Python package CovsirPhy that allows you to easily download and analyze COVID-19 data (such as the number of PCR positives).

** This time, I will explain the scenario analysis (parameter comparison). ** **

The English version of the document is Covsir Phy: COVID-19 analysis with phase-dependent SIRs, [Kaggle: COVID-19 data with SIR model]( Please refer to https://www.kaggle.com/lisphilar/covid-19-data-with-sir-model).

1. Execution environment

CovsirPhy can be installed by the following method! Please use Python 3.7 or above, or Google Colaboratory.

--Stable version: pip install covsirphy --upgrade --Development version: pip install" git + https://github.com/lisphilar/covid19-sir.git#egg=covsirphy "

import covsirphy as cs
cs.__version__
# '2.8.3'

	Execution environment
OS	Windows Subsystem for Linux / Ubuntu
Python	version 3.8.5

2. Preparation

Click here for the code [^ 2] to download the actual data from COVID-19 Data Hub [^ 1]:

data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()

In the table and graph of this article, the data of Japan as of October 3, 2020 (national total value announced by the Ministry of Health, Labor and Welfare) is used by the following code [^ 3]. COVID-19 Data Hub [^ 1] also contains Japanese data, but it seems to be different from the national total value announced by the Ministry of Health, Labor and Welfare.

[^ 3]: [CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis

# (Optional)Acquisition of data from the Ministry of Health, Labor and Welfare
#If you get an error"pip install requests aiohttp"
japan_data = data_loader.japan()
jhu_data.replace(japan_data)
print(japan_data.citation)

Confirmation of actual data:

#Instantiation of analysis class
snl = cs.Scenario(jhu_data, population_data, country="Japan")
#Graph display of actual data
snl.records(filename=None)

Division into Phase by S-R trend analysis [^ 3] and period adjustment [^ 4]:

[^ 4]: [CovsirPhy] COVID-19 Python package for data analysis: Phase setting optimization

# S-R trend analysis
snl.trend(filename=None)
# Separate 0th phase
snl.separate("01Apr2020")
#List display
snl.summary()

	Type	Start	End	Population
0th	Past	06Feb2020	31Mar2020	126529100
1st	Past	01Apr2020	21Apr2020	126529100
2nd	Past	22Apr2020	06Jul2020	126529100
3rd	Past	07Jul2020	23Jul2020	126529100
4th	Past	24Jul2020	01Aug2020	126529100
5th	Past	02Aug2020	14Aug2020	126529100
6th	Past	15Aug2020	28Aug2020	126529100
7th	Past	29Aug2020	08Sep2020	126529100
8th	Past	09Sep2020	19Sep2020	126529100
9th	Past	20Sep2020	03Oct2020	126529100

Parameter estimation with SIR-F model[^5]:

[^ 5]: [CovsirPhy] COVID-19 Python package for data analysis: Parameter estimation

# Parameter estimation
snl.estimate(cs.SIRF)
#List display (part)
est_cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS, "RMSLE"]
snl.summary(columns=est_cols)

	Start	End	Rt	theta	kappa	rho	sigma	RMSLE
0th	06Feb2020	31Mar2020	3.6	0.0197967	0.000781433	0.110404	0.0292893	0.693445
1st	01Apr2020	21Apr2020	13.53	0.00119307	0.000954275	0.119908	0.00789786	0.17892
2nd	22Apr2020	06Jul2020	0.37	0.090172	0.000858737	0.0264027	0.0636349	0.883253
3rd	07Jul2020	23Jul2020	1.93	0.000466376	8.04707e-05	0.133382	0.0689537	0.0318977
4th	24Jul2020	01Aug2020	1.78	7.63782e-05	0.000231109	0.135991	0.0760643	0.0193947
5th	02Aug2020	14Aug2020	1.4	0.00100709	0.000215838	0.100929	0.0716643	0.0797834
6th	15Aug2020	28Aug2020	0.82	0.000562179	0.000899694	0.0783473	0.0943161	0.021244
7th	29Aug2020	08Sep2020	0.7	0.00168729	0.00119958	0.0616905	0.0863032	0.0182731
8th	09Sep2020	19Sep2020	0.78	0.00293238	0.00132534	0.0793277	0.099966	0.0282502
9th	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479	0.0274187

3. What is a scenario?

In Covsirphy's Scenario class, multiple types of Phase [^ 3] settings can be registered. "Phase [^ 3] settings" is called "scenario".

For example, "Main scenario" and "Medicine scenario" can be set in parallel and scenario analysis can be performed as shown in the following table. The dates are listed on the assumption that the actual data from 2020/2/6 to 2020/10/3 are available.

scenario name	0th-9th	10th	11th
Main	2/6 - 10/3	10/4 - 12/31	2021/1/1 - 2020/4/10
Medicine	2/6 - 10/3	10/4 - 12/31	2021/1/1 - 2020/4/10

Estimate the parameters of the 0th-9th phase for each scenario, and simulate the number of patients assuming that the parameters of the 10th phase are the same and that $ \ sigma $ doubles between the 10th and 11th for the Medicine scenario. .. At this time, the doubling effect of $ \ sigma $ can be estimated by comparing the changes in the number of patients between 2021 / 1/1-2021 / 4/10.

It is unlikely that it will double at once due to the creation of new drugs, but please refer to it as a demonstration.

4. Creating and deleting scenarios

To examine the effect of parameters on the number of patients, create scenarios with different parameters, simulate the number of patients, and compare them. First, how to create / initialize / delete the scenario itself is as follows.

Create New

You can use the Scenario.clear (name =" new scenario name ", template =" source scenario name ") method to copy a configured scenario and create a new one. When creating a "New" scenario with the Main scenario (default) as the copy source

#Create New
snl.clear(name="New", template="Main")
#List display
cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS]
snl.summary(columns=cols)

(1st-8th phase omitted on paper)

	Start	End	Rt	theta	kappa	rho	sigma
('Main', '0th')	06Feb2020	31Mar2020	3.85	0.0185683	0.000768779	0.112377	0.027892
('Main', '9th')	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
('New', '0th')	06Feb2020	31Mar2020	3.85	0.0185683	0.000768779	0.112377	0.027892
('New', '9th')	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479

Delete

Use Scenario.delete (name =" scenario name ") to delete. Delete the "New" scenario and return only the "Main" scenario to the registered state.

#Delete scenario
snl.delete(name="New")
#List display
snl.summary(columns=cols)

(1st-8th phase omitted on paper)

	Start	End	Rt	theta	kappa	rho	sigma
0th	06Feb2020	31Mar2020	3.85	0.0185683	0.000768779	0.112377	0.027892
9th	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479

5. [Add] Specify the last date

From here, I will explain how to add a phase. You can add a new phase by specifying the last date, such as Scenario.add (end_date =" 31Dec2020 ").

# Main:Added up to 31Dec2020 as 10th phase
snl.add(end_date="31Dec2020", name="Main")
#List display
snl.summary(columns=cols, name="Main")

(1st-8th phase omitted on paper)

	Start	End	Rt	theta	kappa	rho	sigma
0th	06Feb2020	31Mar2020	3.61	0.0190222	0.00099167	0.110766	0.0290761
9th	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
10th	04Oct2020	31Dec2020	0.87	0.000463924	0.000984493	0.0793406	0.090479

Since no parameters are set, the added 10th phase parameters have the same values as the 9th phase.

6. [Add] Specify the number of days

You can add a new phase by specifying the number of days, such as Scenario.add (days = 100).

# main: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(days=100, name="Main").summary(columns=cols)
#List display
snl.summary(columns=cols, name="Main")

(1st-8th phase omitted on paper)

	Start	End	Rt	theta	kappa	rho	sigma
0th	06Feb2020	31Mar2020	3.61	0.0190222	0.00099167	0.110766	0.0290761
9th	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
10th	04Oct2020	31Dec2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
11th	01Jan2021	10Apr2021	0.87	0.000463924	0.000984493	0.0793406	0.090479

Since no parameters have been set, the added 11th phase parameters have the same values as the 10th phase.

7. Specify the Add parameter

You can add a new phase by specifying a parameter value, such as Scenario.add (sigma = 0.18).

Get parameter estimates

In this article, in order to estimate the doubling effect of $ \ sigma $, the estimated value of $ \ sigma $ in the Main scenario 11 phase (last phase) is obtained by the Scenario.get () method.

sigma_last = snl.get("sigma", phase="last", name="Main")
sigma_med = sigma_last * 2
print(round(sigma_last, 3), round(sigma_med, 3))
# -> 0.09 0.181

Medicine scenario settings

Create a Medicine scenario with the Main scenario as the copy source. At this time, future phases (10th, 11th phase) will be deleted. Then add the 10th phase (same parameters as Main) and 11th phase ($ \ sigma $ is doubled).

snl.clear(name="Medicine", template="Main")
snl.add(end_date="31Dec2020", name="Medicine")
# Medicine: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(sigma=sigma_med, days=100, name="Medicine")
#List display
snl.summary(columns=cols, name="Medicine")

(1st-8th phase omitted on paper)

	Start	End	Rt	theta	kappa	rho	sigma
0th	06Feb2020	31Mar2020	3.85	0.0185683	0.000768779	0.112377	0.027892
9th	20Sep2020	03Oct2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
10th	04Oct2020	31Dec2020	0.87	0.000463924	0.000984493	0.0793406	0.090479
11th	01Jan2021	10Apr2021	0.44	0.000463924	0.000984493	0.0793406	0.180958

Since $ sigma $ has doubled, the effective reproduction number $ Rt $ has been halved.

8. Delete phase

As an aside, you can delete a phase with Scenario.delete (phase = [" Phase name "]). Here, add and delete the 12th phase.

#Add Phase
snl.add(days=30, name="Medicine")
#Delete Phase
snl.delete(phases=["last"], name="Medicine")

9. Confirmation of parameter transition

Check the graph to see if the parameter transitions are set as planned. Use Scenario.history (parameter name). Use the scenario name as the series name.

Transition of Sigma

Whether $ \ sigma $ has doubled in the 11th phase of the Medicine scenario (2021/1/1 --2021/4/1):

snl.history("sigma", filename=None)

Transition of Rt

Whether the effective reproduction number $ Rt $ is halved to the 11th phase (2021/1/1 --2021/4/1) of the Medicine scenario:

snl.history("Rt", filename=None)

10. Simulation of the number of patients

You can also get the simulation result of the number of patients in Scenario.history (variable name).

graph display

snl.history("Infected", filename=None)

The number of infected people is rapidly decreasing due to the doubling of $ \ sigma $. It is unlikely that it will double at once due to the creation of new drugs, but by changing the parameter values to realistic values and performing a simulation, it is possible to estimate the number of infected people and the end time in the future.

Data frame format

If you want to get the value in dataframe format, use the Scenario.track () method.

snl.track().tail()

	Scenario	Date	Confirmed	Fatal	Recovered	Population	Rt	theta	kappa	rho	sigma	1/alpha2 [day]	1/gamma [day]	1/beta [day]
425	Medicine	2021-04-06	108463	1884	106579	126529100	0.44	0.000463924	0.000984493	0.0793406	0.180958	1015	5	12
426	Medicine	2021-04-07	108463	1884	106579	126529100	0.44	0.000463924	0.000984493	0.0793406	0.180958	1015	5	12
427	Medicine	2021-04-08	108463	1884	106579	126529100	0.44	0.000463924	0.000984493	0.0793406	0.180958	1015	5	12
428	Medicine	2021-04-09	108463	1884	106579	126529100	0.44	0.000463924	0.000984493	0.0793406	0.180958	1015	5	12
429	Medicine	2021-04-10	108463	1884	106579	126529100	0.44	0.000463924	0.000984493	0.0793406	0.180958	1015	5	12

11. Comparison of characteristic values

Compare the characteristic values to understand the characteristics of the scenario. However, since it is meaningless to compare the average number of infected people, the characteristic values are set as follows.

--Maximum number of infected people and their date --Number of infected people on the day after the last day --Number of deaths on the day after the last day -(Considering addition) Integral value of the number of infected people

Click here for code and output results:

snl.describe()

	max(Infected)	argmax(Infected)	Infected on 11Apr2021	Fatal on 11Apr2021	11th_Rt
Main	15154	21Apr2020	512	1970	0.87
Medicine	15154	21Apr2020	0	1884	0.44

The effective reproduction number $ Rt $ is output only for Phases whose values differ between scenarios.

If the current parameters continue until 2021/4/10, the predicted number of infected people will be around 512 even as of 4/11 ... On the other hand, if $ \ sigma $ doubles, infected people The predicted number is 0.

12. Postscript

Thank you for browsing this time as well! We would appreciate it if you could give us your opinions, such as adding characteristic values.

Next time, I will explain how to evaluate the effectiveness of vaccines.