[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)

Introduction This is an introductory article about the Python package CovsirPhy that allows you to easily download and analyze COVID-19 data (such as the number of PCR positives).

  1. SIR model
  2. SIR-F model
  3. Data loading
  4. S-R trend analysis
  5. Parameter estimation
  6. Optimize Phase Settings

** This time, I will explain the scenario analysis (parameter comparison). ** **

The English version of the document is Covsir Phy: COVID-19 analysis with phase-dependent SIRs, [Kaggle: COVID-19 data with SIR model]( Please refer to https://www.kaggle.com/lisphilar/covid-19-data-with-sir-model).

1. Execution environment

CovsirPhy can be installed by the following method! Please use Python 3.7 or above, or Google Colaboratory.

--Stable version: pip install covsirphy --upgrade --Development version: pip install" git + https://github.com/lisphilar/covid19-sir.git#egg=covsirphy "

import covsirphy as cs
cs.__version__
# '2.8.3'
Execution environment
OS Windows Subsystem for Linux / Ubuntu
Python version 3.8.5

2. Preparation

Click here for the code [^ 2] to download the actual data from COVID-19 Data Hub [^ 1]:

data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()

In the table and graph of this article, the data of Japan as of October 3, 2020 (national total value announced by the Ministry of Health, Labor and Welfare) is used by the following code [^ 3]. COVID-19 Data Hub [^ 1] also contains Japanese data, but it seems to be different from the national total value announced by the Ministry of Health, Labor and Welfare.

[^ 3]: [CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis

# (Optional)Acquisition of data from the Ministry of Health, Labor and Welfare
#If you get an error"pip install requests aiohttp"
japan_data = data_loader.japan()
jhu_data.replace(japan_data)
print(japan_data.citation)

Confirmation of actual data:

#Instantiation of analysis class
snl = cs.Scenario(jhu_data, population_data, country="Japan")
#Graph display of actual data
snl.records(filename=None)

Division into Phase by S-R trend analysis [^ 3] and period adjustment [^ 4]:

[^ 4]: [CovsirPhy] COVID-19 Python package for data analysis: Phase setting optimization

# S-R trend analysis
snl.trend(filename=None)
# Separate 0th phase
snl.separate("01Apr2020")
#List display
snl.summary()
Type Start End Population
0th Past 06Feb2020 31Mar2020 126529100
1st Past 01Apr2020 21Apr2020 126529100
2nd Past 22Apr2020 06Jul2020 126529100
3rd Past 07Jul2020 23Jul2020 126529100
4th Past 24Jul2020 01Aug2020 126529100
5th Past 02Aug2020 14Aug2020 126529100
6th Past 15Aug2020 28Aug2020 126529100
7th Past 29Aug2020 08Sep2020 126529100
8th Past 09Sep2020 19Sep2020 126529100
9th Past 20Sep2020 03Oct2020 126529100

Parameter estimation with SIR-F model[^5]:

[^ 5]: [CovsirPhy] COVID-19 Python package for data analysis: Parameter estimation

# Parameter estimation
snl.estimate(cs.SIRF)
#List display (part)
est_cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS, "RMSLE"]
snl.summary(columns=est_cols)
Start End Rt theta kappa rho sigma RMSLE
0th 06Feb2020 31Mar2020 3.6 0.0197967 0.000781433 0.110404 0.0292893 0.693445
1st 01Apr2020 21Apr2020 13.53 0.00119307 0.000954275 0.119908 0.00789786 0.17892
2nd 22Apr2020 06Jul2020 0.37 0.090172 0.000858737 0.0264027 0.0636349 0.883253
3rd 07Jul2020 23Jul2020 1.93 0.000466376 8.04707e-05 0.133382 0.0689537 0.0318977
4th 24Jul2020 01Aug2020 1.78 7.63782e-05 0.000231109 0.135991 0.0760643 0.0193947
5th 02Aug2020 14Aug2020 1.4 0.00100709 0.000215838 0.100929 0.0716643 0.0797834
6th 15Aug2020 28Aug2020 0.82 0.000562179 0.000899694 0.0783473 0.0943161 0.021244
7th 29Aug2020 08Sep2020 0.7 0.00168729 0.00119958 0.0616905 0.0863032 0.0182731
8th 09Sep2020 19Sep2020 0.78 0.00293238 0.00132534 0.0793277 0.099966 0.0282502
9th 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479 0.0274187

3. What is a scenario?

In Covsirphy's Scenario class, multiple types of Phase [^ 3] settings can be registered. "Phase [^ 3] settings" is called "scenario".

For example, "Main scenario" and "Medicine scenario" can be set in parallel and scenario analysis can be performed as shown in the following table. The dates are listed on the assumption that the actual data from 2020/2/6 to 2020/10/3 are available.

scenario name 0th-9th 10th 11th
Main 2/6 - 10/3 10/4 - 12/31 2021/1/1 - 2020/4/10
Medicine 2/6 - 10/3 10/4 - 12/31 2021/1/1 - 2020/4/10

Estimate the parameters of the 0th-9th phase for each scenario, and simulate the number of patients assuming that the parameters of the 10th phase are the same and that $ \ sigma $ doubles between the 10th and 11th for the Medicine scenario. .. At this time, the doubling effect of $ \ sigma $ can be estimated by comparing the changes in the number of patients between 2021 / 1/1-2021 / 4/10.

It is unlikely that it will double at once due to the creation of new drugs, but please refer to it as a demonstration.

4. Creating and deleting scenarios

To examine the effect of parameters on the number of patients, create scenarios with different parameters, simulate the number of patients, and compare them. First, how to create / initialize / delete the scenario itself is as follows.

Create New

You can use the Scenario.clear (name =" new scenario name ", template =" source scenario name ") method to copy a configured scenario and create a new one. When creating a "New" scenario with the Main scenario (default) as the copy source

#Create New
snl.clear(name="New", template="Main")
#List display
cols = ["Start", "End", "Rt", *cs.SIRF.PARAMETERS]
snl.summary(columns=cols)

(1st-8th phase omitted on paper)

Start End Rt theta kappa rho sigma
('Main', '0th') 06Feb2020 31Mar2020 3.85 0.0185683 0.000768779 0.112377 0.027892
('Main', '9th') 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
('New', '0th') 06Feb2020 31Mar2020 3.85 0.0185683 0.000768779 0.112377 0.027892
('New', '9th') 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479

Delete

Use Scenario.delete (name =" scenario name ") to delete. Delete the "New" scenario and return only the "Main" scenario to the registered state.

#Delete scenario
snl.delete(name="New")
#List display
snl.summary(columns=cols)

(1st-8th phase omitted on paper)

Start End Rt theta kappa rho sigma
0th 06Feb2020 31Mar2020 3.85 0.0185683 0.000768779 0.112377 0.027892
9th 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479

5. [Add] Specify the last date

From here, I will explain how to add a phase. You can add a new phase by specifying the last date, such as Scenario.add (end_date =" 31Dec2020 ").

# Main:Added up to 31Dec2020 as 10th phase
snl.add(end_date="31Dec2020", name="Main")
#List display
snl.summary(columns=cols, name="Main")

(1st-8th phase omitted on paper)

Start End Rt theta kappa rho sigma
0th 06Feb2020 31Mar2020 3.61 0.0190222 0.00099167 0.110766 0.0290761
9th 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
10th 04Oct2020 31Dec2020 0.87 0.000463924 0.000984493 0.0793406 0.090479

Since no parameters are set, the added 10th phase parameters have the same values as the 9th phase.

6. [Add] Specify the number of days

You can add a new phase by specifying the number of days, such as Scenario.add (days = 100).

# main: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(days=100, name="Main").summary(columns=cols)
#List display
snl.summary(columns=cols, name="Main")

(1st-8th phase omitted on paper)

Start End Rt theta kappa rho sigma
0th 06Feb2020 31Mar2020 3.61 0.0190222 0.00099167 0.110766 0.0290761
9th 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
10th 04Oct2020 31Dec2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
11th 01Jan2021 10Apr2021 0.87 0.000463924 0.000984493 0.0793406 0.090479

Since no parameters have been set, the added 11th phase parameters have the same values as the 10th phase.

7. Specify the Add parameter

You can add a new phase by specifying a parameter value, such as Scenario.add (sigma = 0.18).

Get parameter estimates

In this article, in order to estimate the doubling effect of $ \ sigma $, the estimated value of $ \ sigma $ in the Main scenario 11 phase (last phase) is obtained by the Scenario.get () method.

sigma_last = snl.get("sigma", phase="last", name="Main")
sigma_med = sigma_last * 2
print(round(sigma_last, 3), round(sigma_med, 3))
# -> 0.09 0.181

Medicine scenario settings

Create a Medicine scenario with the Main scenario as the copy source. At this time, future phases (10th, 11th phase) will be deleted. Then add the 10th phase (same parameters as Main) and 11th phase ($ \ sigma $ is doubled).

snl.clear(name="Medicine", template="Main")
snl.add(end_date="31Dec2020", name="Medicine")
# Medicine: 01Jan2021 (The day after the last day of the 10th)Added 100 days as 11th
snl.add(sigma=sigma_med, days=100, name="Medicine")
#List display
snl.summary(columns=cols, name="Medicine")

(1st-8th phase omitted on paper)

Start End Rt theta kappa rho sigma
0th 06Feb2020 31Mar2020 3.85 0.0185683 0.000768779 0.112377 0.027892
9th 20Sep2020 03Oct2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
10th 04Oct2020 31Dec2020 0.87 0.000463924 0.000984493 0.0793406 0.090479
11th 01Jan2021 10Apr2021 0.44 0.000463924 0.000984493 0.0793406 0.180958

Since $ sigma $ has doubled, the effective reproduction number $ Rt $ has been halved.

8. Delete phase

As an aside, you can delete a phase with Scenario.delete (phase = [" Phase name "]). Here, add and delete the 12th phase.

#Add Phase
snl.add(days=30, name="Medicine")
#Delete Phase
snl.delete(phases=["last"], name="Medicine")

9. Confirmation of parameter transition

Check the graph to see if the parameter transitions are set as planned. Use Scenario.history (parameter name). Use the scenario name as the series name.

Transition of Sigma

Whether $ \ sigma $ has doubled in the 11th phase of the Medicine scenario (2021/1/1 --2021/4/1):

snl.history("sigma", filename=None)

sigma.jpg

Transition of Rt

Whether the effective reproduction number $ Rt $ is halved to the 11th phase (2021/1/1 --2021/4/1) of the Medicine scenario:

snl.history("Rt", filename=None)

rt.jpg

10. Simulation of the number of patients

You can also get the simulation result of the number of patients in Scenario.history (variable name).

graph display

snl.history("Infected", filename=None)

infected.jpg

The number of infected people is rapidly decreasing due to the doubling of $ \ sigma $. It is unlikely that it will double at once due to the creation of new drugs, but by changing the parameter values to realistic values and performing a simulation, it is possible to estimate the number of infected people and the end time in the future.

Data frame format

If you want to get the value in dataframe format, use the Scenario.track () method.

snl.track().tail()
Scenario Date Confirmed Fatal Infected Recovered Population Rt theta kappa rho sigma 1/alpha2 [day] 1/gamma [day] alpha1 [-] 1/beta [day]
425 Medicine 2021-04-06 108463 1884 0 106579 126529100 0.44 0.000463924 0.000984493 0.0793406 0.180958 1015 5 0 12
426 Medicine 2021-04-07 108463 1884 0 106579 126529100 0.44 0.000463924 0.000984493 0.0793406 0.180958 1015 5 0 12
427 Medicine 2021-04-08 108463 1884 0 106579 126529100 0.44 0.000463924 0.000984493 0.0793406 0.180958 1015 5 0 12
428 Medicine 2021-04-09 108463 1884 0 106579 126529100 0.44 0.000463924 0.000984493 0.0793406 0.180958 1015 5 0 12
429 Medicine 2021-04-10 108463 1884 0 106579 126529100 0.44 0.000463924 0.000984493 0.0793406 0.180958 1015 5 0 12

11. Comparison of characteristic values

Compare the characteristic values to understand the characteristics of the scenario. However, since it is meaningless to compare the average number of infected people, the characteristic values are set as follows.

--Maximum number of infected people and their date --Number of infected people on the day after the last day --Number of deaths on the day after the last day -(Considering addition) Integral value of the number of infected people

Click here for code and output results:

snl.describe()
max(Infected) argmax(Infected) Infected on 11Apr2021 Fatal on 11Apr2021 11th_Rt
Main 15154 21Apr2020 512 1970 0.87
Medicine 15154 21Apr2020 0 1884 0.44

The effective reproduction number $ Rt $ is output only for Phases whose values differ between scenarios.

If the current parameters continue until 2021/4/10, the predicted number of infected people will be around 512 even as of 4/11 ... On the other hand, if $ \ sigma $ doubles, infected people The predicted number is 0.

12. Postscript

Thank you for browsing this time as well! We would appreciate it if you could give us your opinions, such as adding characteristic values.

Next time, I will explain how to evaluate the effectiveness of vaccines.

Recommended Posts

[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
Python for Data Analysis Chapter 4
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Preprocessing template for data analysis (Python)
Python visualization tool for data analysis work
Data analysis python
Data analysis with python 2
Data analysis using Python 0
Data analysis overview python
Python data analysis template
Python package manager comparison
Data analysis with Python
Let's analyze Covid-19 (Corona) data using Python [For beginners]
[For beginners] How to study Python3 data analysis exam
My python data analysis container
[Python] Notes on data analysis
[Understand in the shortest time] Python basics for data analysis
Which should I study, R or Python, for data analysis?
Python data analysis learning notes
<Python> Build a dedicated server for Jupyter Notebook data analysis
Data analysis using python pandas
Tips for data analysis ・ Notes
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
Create a USB boot Ubuntu with a Python environment for data analysis
Python: Time Series Analysis: Preprocessing Time Series Data
Python course for data science_useful techniques
Data analysis for improving POG 3-Regression analysis-
Data formatting for Python / color plots
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
A summary of Python e-books that are useful for free-to-read data analysis
Python environment tool comparison chart for Rubyist
Detailed Python techniques required for data shaping (1)
[Python] First data analysis / machine learning (Kaggle)
Data analysis starting with python (data preprocessing-machine learning)
How to use "deque" for Python data
Detailed Python techniques required for data shaping (2)
I did Python data analysis training remotely
Python 3 Engineer Certified Data Analysis Exam Preparation
JupyterLab Basic Setting 2 (pip) for data analysis
JupyterLab Basic Setup for Data Analysis (pip)
Analysis for Data Scientists: Qiita Self-Article Summary 2020
Data analysis in Python Summary of sources to look at first for beginners