[PYTHON] Estimate the peak infectivity of the new coronavirus

Introduction

A state of emergency was declared for the convergence of the new coronavirus infection (COVID-19), and although it was not as much as lockdown in other countries, various industries were closed, events were canceled, remote work, various self-restraints, etc. , Is affecting the lives of many people. The condition for an infectious disease such as corona to converge is that the effective reproduction number (the total number of secondary infections produced by one infected person) is persistently less than 1. In Previous article, the incubation period and infection period are one of the methods to calculate the effective reproduction number from the data of the number of infected persons in chronological order. I showed a simple method to calculate separately. In this article, what happens when the infectivity, that is, the strength of infecting others after infection, is incorporated into the effect that changes with time, ** especially how many days after infection the peak of infectivity * I would like to estimate about *. In particular, may it be infected even during the incubation period in the media? Since the suspicion has been reported, I think that it will also be a verification of its authenticity.

Prerequisite knowledge

I would like to introduce three studies that are the premise of the examination of this article. The first is Survey by the Cluster Countermeasures Group of the Ministry of Health, Labor and Welfare. According to the cluster countermeasures group, the basic reproduction number (when it is basic, it does not consider environmental factors, and when it is effective, it seems that policies are also taken into consideration) is sampled as shown in the following graph. クラスター対策班のR0.jpg Thankfully, ** the average value is specified as 1.49 ** (important here). In other words, while most people do not infect (0 people), they can infect many people at once, such as 11 or 12. Therefore, R0 should be considered as a wide-tailed distribution. The second is [Research Report No.13](https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe] from Imperial College London, England. -npi-impact /). The university has published a report on COVID-19 early on and seems to have a strong influence on British policy. What should be noted in this report is that the infectivity is expressed using the gamma distribution as shown in the figure below. ICL_Rep13_Fig7.jpg The average value is 6.5 [days], but the peak seems to be around the 4th to 5th days. The third is Report on the incubation period of COVID-19 of Johns Hopkins University in the United States. estimates-5-days-for-incubation-period.html). According to this, the median incubation period is 5.1 days **. This, combined with the graph from the second study, suggests that ** there may be a peak of infectivity before the onset date **. Based on the above three studies, we will examine the peak infectivity, especially using data on the number of infected people in Japan.

Computational model

To calculate the basic reproduction number considering infectivity, consider a model based on the SEIR model, which is a slightly improved version of Previous article. I would like. See the following figure. R0_calculation_with_bnomi.jpg In the previous article, it was stated that a secondary infection would occur from an infected person during the infection period (I) with the same infection intensity, but this time we will adopt the following assumptions.

  1. Infection can occur even during the incubation period (E).
  2. Secondary infection results from the number of infected persons, which is the product of the number of infected persons and the intensity of infection.
  3. Infection intensity is given by a function that depends on the number of days since the date of infection.
  4. The sum of the incubation period (lp) and the infection period (ip) is 13.

In addition, the function of infection intensity of 3 adopts the binomial distribution. This is because it is easier to handle because it requires only one parameter compared to the gamma distribution. Now, the formula for the basic reproduction number based on the binomial distribution is determined as follows.

R_0(y) = r_0 \cdot {}_N C_y \cdot \theta^y (1-\theta)^{N-y}

$ r_0 $ is a constant, $ \ theta $ is a binary distribution parameter, $ y $ is the number of days since the date of infection, and $ N $ is the sum of the incubation period and the infection period (= 13). Secondary infection from an infected person to an infected person is defined by convolution as shown in the following equation. $ P (t) $ is the number of confirmed positives at time $ t $.

P(t+N) = \sum_{y=0}^N R_0(N-y) P(t+y)

Transforming these gives you the formula to calculate $ r_0 $.

r_0 = \frac{P(t+N)}{\sum_{y=0}^N {}_N C_{N-y} \cdot \theta^{N-y} (1-\theta)^{y} P(t+y)}

It is a little unnatural that both the denominator and the numerator have $ P (t + N) $, but the calculation includes the effect of causing a secondary infection immediately on the day of infection. Now, according to this $ R_0 (y) $, I would like to determine the infectivity parameter $ \ theta $, that is, the time when the infectivity is maximized $ N \ theta $, but I will go with the following strategy.

  1. Determine $ \ theta $ appropriately.
  2. Find $ r_0 $ from the formula of $ R_0 (y) $ and the actual data.
  3. Find the mean of the distribution of $ r_0 $.
  4. Repeat steps 1 to 3 and estimate $ \ theta $, where $ E [r_0; \ theta] $ is closest to 1.49 days, as the true value.

In step 4, if the distribution of the cluster group is numerical data, a method such as calculating and optimizing the distance between the probability distributions could be considered, but since there is only data in the image ** I gave up…. I want you to publish it on csv. (Chilla | д ゚)

Data to be analyzed

Map of the number of new coronavirus infections by prefecture (provided by Jag Japan Co., Ltd.), we have used the publicly available csv data. It was. I think it is wonderful that private companies are volunteering to collect and publish such data.

Try to calculate with Python & result

This time, I would like to see the results at the same time while calculating.

1. Preparation

Set the library and font.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import locale
#
from scipy.special import comb

font = {'family' : 'IPAexGothic'}
plt.rc('font', **font)

It is a function that counts the number of infected people by prefecture from the data of Jag Japan.

def readCsvOfJapanPref(pref : None):
    #Download from the URL below
    # https://jag-japan.com/covid19map-readme/
    fcsv = u'COVID-19.csv'
    df = pd.read_csv(fcsv, header=0, encoding='utf8', parse_dates=[u'Fixed date'])
    #Fixed date,Extract only the prefectures where you receive a medical examination
    df1 = df.loc[:,[u'Fixed date',u'Consultation prefecture']]
    df1.columns = ['date','pref']
    #Extracted by consultation prefecture
    if pref is None or pref == u'In Japan':
        df1 = df1
    else:
        df1 = df1[df1.pref == pref]
    df1 = df1.loc[:,'date']
    #Count on fixed date
    df2 = pd.DataFrame( df1.value_counts() ).reset_index()
    df2.columns = ['date','P']
    df2 = df2.sort_values('date').reset_index(drop = True)
    #
    return df2

This function calculates the basic reproduction number in consideration of the infection intensity. We have put in zero percent measures.

def binomi(y, N, th):
    c = comb(N, y)
    p = c * pow(th, y) * pow(1. - th, N - y)
    return p

def calcR0(df, keys):
    #Binomial distribution
    N  = keys['N']
    th = keys['th']
    bcoef = [binomi(y, N, th) for y in range(N+1) ]
    #
    nrow = len(df)
    getP  = lambda s: df.loc[s, 'P' ] if (s >= 0 and s < nrow) else np.NaN
    for t in range(1, nrow):
        df.loc[t, 'Ppre' ] = sum([ getP(t + y) * bcoef[N - y] for y in range(0, N + 1)])
        df.loc[t, 'Pat'  ] = getP(t + N )
        if df.loc[t, 'Ppre'] > 0.1: #Zero percent measures
            df.loc[t, 'R0'  ] = df.loc[t, 'Pat'] / df.loc[t, 'Ppre']
        else:
            df.loc[t, 'R0'  ] = np.NaN
    return df

This function calculates the basic reproduction number for each prefecture.

def makeCalcFrame(days):
    t_1 = pd.Timestamp(2020,1,24) #Calculation start date
    td = pd.Timedelta('1 days')
    #
    npd = [[t_1 + td * i, 0, 0, 0 ] for i in range(0,days)]
    df1 = pd.DataFrame(npd)
    df1.columns = ['date', 'Ppre','Pat', 'R0']
    #
    return df1

def mergeCalcFrame(df1, df2):
    return pd.merge(df1, df2, on='date', how='left').fillna(0)

def R0inJapanPref(pref, keys):
    #
    df1 = makeCalcFrame(100)
    df2 = readCsvOfJapanPref(pref)
    df = mergeCalcFrame(df1, df2)
    #
    df = calcR0(df, keys)
    #
    return df

2. Relationship between the parameter θ of the binomial distribution and the average value of the basic reproduction number R0

Now, let's calculate the basic reproduction number for each parameter $ \ theta $ included in the infection intensity $ R_0 (y) $ by prefecture and see its distribution. A function for calculation. The candidates for the parameter $ \ theta $ will be calculated on the grid.

def calcDfMean(dflist):
    mlist = []
    for df, pref in dflist:
        R0list = df[(df.R0 >= 0)]['R0']
        m = np.mean(R0list)
        mlist.append(m)
    return np.mean(mlist)

def calcThToDfMean():
    preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']

    keys = {'N':13}
    dc = {}
    for y in np.linspace(0, 13, 14):
        keys['th'] = y / keys['N']
        dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]
        dm = calcDfMean(dflist)
        dc[y] = dm
    #
    return dc

This is the part to calculate and display. It will take a few minutes.

dc = calcThToDfMean()

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
ax1.plot(list(dc.keys()), list(dc.values()))
ax1.set_xlabel('Nθ=E[y]')
ax1.set_ylabel('E[R0(y)]')
ax1.set_xticks(np.linspace(0,13,14))
ax1.set_xlim(0,)
ax1.set_ylim(0,)
ax1.grid()
ax1.set_title('The peak day vs the expected R0')
plt.show()

Let's see the calculation result. R0_COVID-19_BinEst_THvsER0.png The horizontal axis represents $ N \ theta $, that is, the mean value of the binomial distribution, and the vertical axis represents the mean value of the basic reproduction number $ r_0 $. In the graph of the ** cluster countermeasure group earlier, the average number of basic reproductions was 1.49. ** ** From this graph, if you look at the intersection position where the vertical axis is 1.49, you can see that $ N \ theta = 5 $. Therefore, let's set the parameter $ \ theta $ to an estimated value of $ 5 / N = 0.385 $. ** **

3. Infection intensity and basic reproduction number when using the estimated parameter θ

Next, let's look at the distribution of basic reproduction numbers when the estimated parameter $ \ theta = 0.385 $ is used. A function for calculating the distribution.

def makeR0frame(dflist):
    dc = {}
    for df, pref in dflist:
        R0list = df[(df.R0 >= 0)]['R0']
        dc[pref] = R0list
    return pd.DataFrame(dc)

def showR0bar(ax, df):
    color = dict(boxes="Gray",whiskers="DarkOrange",medians="Black",caps="Gray")
    df.plot.box(color=color, ax = ax)

def showBCoef(ax, keys):
    N = keys['N']
    th = keys['th']
    ylist = [y for y in range(N+1) ]
    bcoef = [binomi(y, N, th) for y in ylist ]
    ax.plot(ylist, bcoef)
    ax.set_xticks(np.linspace(0,N,N+1))
    ax.grid()
    ax.set_ylim(0,)
    ax.set_xlim(0,)

This is the part that calculates and displays the distribution.

preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']

keys = {'N':13, 'th':5./13. } 
dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]

fig = plt.figure(figsize=(10,5))
plt.rcParams["font.size"] = 10
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
#
df1 = makeR0frame(dflist)
showR0bar(ax1, df1)
showBCoef(ax2, keys)
ax1.legend()
ax1.set_title("R0")
ax2.set_xlabel('Nθ=E[y]')
ax2.set_ylabel('P(y;θ)')
ax2.set_title('binomi(y;θ)')
    
plt.show()

Let's see the calculation result. R0_COVID-19_BinEst_R0prefs.png The left is a boxplot of the distribution of basic reproduction numbers by prefecture, and the right is the binomial distribution of infection intensity. Looking at the figure on the left, you can see that the distribution has a fairly long tail. In addition, the maximum outlier is around 17.5 in Hokkaido. As you can see by changing the parameters, it seems that there are few differences between prefectures with this parameter. Looking at the figure on the right, I feel that it is similar to the gamma distribution in the report of Imperial College London. The peak position is on the 5th day, so this is a little behind.

4. Match answers with a histogram of basic reproduction numbers

Finally, make a histogram of the basic reproduction number and match the answer with the histogram of the cluster countermeasure team. The code to calculate.

def showR0hist(ax, df, pref):
    R0list = df[df.R0 >= 0]['R0']
    R0list = R0list / R0list.sum() * 100
    R0list.hist( bins=30, label=pref, ax=ax)

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
for df, pref in dflist:
    showR0hist(ax1, df, pref)
ax1.legend()
#
plt.show()

Let's see the calculation result. R0_COVID-19_BinEst_R0allhist_比較用.png The left is R0 estimated from the data of the number of infected people, and the right is R0 aggregated by the survey of the cluster countermeasure team. The shape is pretty good. It is a shape that can be approximated by an exponential distribution. The average values are the same, so Yoshi! Let's say.

Consideration

From the above, the following trends can be derived from the calculation results regarding the estimation of the peak infectivity based on the data of infected persons in each prefecture in Japan.

Furthermore ...

Reference link

I referred to the following page.

  1. Calculate the transition of the basic reproduction number of the new coronavirus by prefecture
  2. New Corona Cluster Countermeasure Expert (2)
  3. “Definition of terms for new corona cluster countermeasures” reported by experts in new corona cluster countermeasures
  4. Report 13 - Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries
  5. New Study on COVID-19 Estimates 5.1 Days for Incubation Period
  6. [Python] Accelerate combination (nCr) calculation
  7. Map of the number of people infected with new coronavirus by prefecture (provided by Jag Japan Co., Ltd.)

Recommended Posts

Estimate the peak infectivity of the new coronavirus
Plot the spread of the new coronavirus
Factfulness of the new coronavirus seen in Splunk
GUI simulation of the new coronavirus (SEIR model)
Let's test the medical collapse hypothesis of the new coronavirus
Quantify the degree of self-restraint required to contain the new coronavirus
Analyzing the age-specific severity of coronavirus
(Now) I analyzed the new coronavirus (COVID-19)
Did the number of store closures increase due to the impact of the new coronavirus?
The story of the student who developed the new coronavirus countermeasure site (Ishikawa version)
Let's calculate the transition of the basic reproduction number of the new coronavirus by prefecture
I tried to predict the behavior of the new coronavirus with the SEIR model.
Folding @ Home on Linux Mint to contribute to the analysis of the new coronavirus
I touched some of the new features of Python 3.8 ①
The epidemic forecast of the new coronavirus was released on the Web at explosive speed
I tried to automatically send the literature of the new coronavirus to LINE with Python
The theory that the key to controlling infection with the new coronavirus is hyperdispersion of susceptibility
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
Let's put out a ranking of the number of effective reproductions of the new coronavirus by prefecture
Try to estimate the number of likes on Twitter
Tasks at the start of a new python project
Roughly estimate the total memory usage of an object
The beginning of cif2cell
Use hash to lighten collision detection of about 1000 balls in Python (related to the new coronavirus)
The meaning of self
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
the zen of Python
The story of sys.path.append ()
Let's simulate the effect of introducing a contact tracking app as a countermeasure against the new coronavirus
Significance of narrowing down the test target of PCR test for new coronavirus understood by Bayes' theorem
Revenge of the Types: Revenge of types
I analyzed tweets about the new coronavirus posted on Twitter
Let's visualize the number of people infected with coronavirus with matplotlib
Estimate the average score of examinees at Sapporo Medical University
Estimate the attitude of AR markers with Python + OpenCV + drone
Hypothesis of why the new coronavirus is not so common in urban areas such as Tokyo
I tried using PDF data of online medical care based on the spread of the new coronavirus infection