[PYTHON] Significance of narrowing down the test target of PCR test for new coronavirus understood by Bayes' theorem

Introduction

The number of people infected with the new coronavirus infection (COVID-19) has spread from China to the world, and as of March 14, 2020, the number of infected people is 145,637 and the number of deaths is 5,436. However, looking at the situation in Japan, the number of infected people is 734, which is only 9.1% of 8086 in South Korea and 4.2% of 17,660 in Italy. It has been pointed out that the reason for this clear difference is that Japan is limiting the number of PCR tests. It is the theory. Therefore, how much are you actually squeezing? What is the significance of squeezing? Isn't another country over-examining and collapsing medical care? , Etc. are causing controversy. However, because there is not much quantitative discussion, it seems that about emotional theory is developed through TV and the Internet, which hinders correct understanding. Therefore, in this article, based on Bayes' theorem, ** I focused on the significance of narrowing down PCR tests **.

What is a PCR test?

First, let's take a quick look at what a PCR (Polymerase Chain Reaction) test is. National Institute of Infectious Diseases Manual and [Guide for PCR Testing of Takara Bio](http: // While referring to www.takara-bio.co.jp/kensa/pdfs/book_1.pdf) etc., the outline of the inspection is as follows. PCR-detection.jpg In short, it seems to be positive if the characteristic region of the virus is increased by decomposing the double strand of DNA, amplifying a specific site, and repeating the process, and that part can be visually observed by electrophoresis. It seems that if only the part that is really characteristic of the virus can be sufficiently amplified (2 ^ N times), it can be detected with high accuracy (accuracy will be discussed later). However, it seems that the detection accuracy may drop due to the following factors.

Test sensitivity / specificity

Sensitivity and specificity are some of the words we often hear about testing, but let's sort out the relevant terms. This Wiki is well organized.

There are various theories about the PCR test of COVID-19 (SARS-CoV-2), but it seems that there are reports that the sensitivity is about 70% and the specificity is 90% or more. However, the sensitivity may be affected by the sample collection method (wiping the pharynx with a cotton swab) and the transportation environment, and the specificity may be affected by the PCR test process, so it should not be a definite value. Let's do it. In reality, it is physically impossible to test whether there is any virus in the entire human body, so there is probably no true value for sensitivity and specificity.

By the way, looking at the definition of sensitivity and specificity above, I think that many people think that they are referring to posterior probabilities and simultaneous probabilities. So let's define it again with a mathematical formula.

\begin{eqnarray}
Sensitivity: RC&=& P(Inspection=T|Affected=T) \\
Specificity: SP&=& P(Inspection=F|Affected=F) \\
Compliance rate: PC&=& P(Affected=T|Inspection=T) \\
Correct answer rate: AC&=& P(Inspection=T,Affected=T) + P(Inspection=F,Affected=F)\\
\end{eqnarray}

In addition, there are false positives and false negatives that are often heard, and these are defined as follows.

\begin{eqnarray}
False positive rate: FP&=& P(Inspection=T|Affected=F) = 1 -Specificity\\
False Negative Rate: FN&=& P(Inspection=F|Affected=T) = 1 -sensitivity
\end{eqnarray}

Bayes' theorem

[Bayes' Theorem](https://ja.wikipedia.org/wiki/%E3%83%99%E3%82%A4%E3%82%BA%E3%81%AE%E5%AE%9A%E7 % 90% 86) is a formula that expresses the relationship between prior and posterior probabilities. Even in the machine learning area, it often appears in Bayesian inference.

P(B|A)=\frac{P(A|B)P(B)}{P(A)}

Now, based on Bayes' theorem, we can see that ** precision can be calculated from sensitivity and specificity **. Here, let's assume that the test positive compliance rate is PC (T) and the test negative compliance rate is PC (F).

\begin{eqnarray}
PC(T) &=& P(Affected=T|Inspection=T) \\
&=&
 \frac{P(Inspection=T|Affected=T) P(Affected=T)}{P(Inspection=T)} \\
&=&
 \frac{P(Inspection=T|Affected=T) P(Affected=T)}
{ P(Inspection=T|Affected=T) P(Affected=T) + P(Inspection=T|Affected=F) P(Affected=F)} \\
&=&
 \frac{RC \times P(Affected=T)}
{ RC \times P(Affected=T) + (1 - SP) \times P(Affected=F)} \\
  \\
PC(F) &=& P(Affected=F|Inspection=F)\\
&=&
 \frac{P(Inspection=F|Affected=F) P(Affected=F)}{P(Inspection=F)} \\
&=&
 \frac{P(Inspection=F|Affected=F) P(Affected=F)}
{ P(Inspection=F|Affected=T) P(Affected=T) + P(Inspection=F|Affected=F) P(Affected=F)} \\
&=&
 \frac{SP \times P(Affected=T)}
{ (1-RC) \times P(Affected=T) + SP \times P(Affected=F)} \\
\end{eqnarray}

Also, the correct answer rate is the same.

\begin{eqnarray}
AC &=& P(Inspection=T,Affected=T) + P(Inspection=F,Affected=F) \\
&=&
 P(Inspection=T|Affected=T)P(Affected=T) + P(Inspection=F|Affected=F)P(Affected=F) \\
&=&
 RC \times P(Affected=T) + SP \times P(Affected=F)
\end{eqnarray}

Can be written.

Try to calculate with Python

Now, let's use Python to calculate the test-positive precision rate PC (T), test-negative precision rate PC (F), and correct answer rate (AC).

Prerequisites

As a prerequisite, use the following assumptions:

As mentioned above, we do not know the true values of these, so it may be a good idea to change them in various ways and simulate them.

Import the library.

import numpy as np
import matplotlib.pyplot as plt

Define a function to calculate the test-positive match rate PC (T), test-negative match rate PC (F), and correct answer rate (AC). As arguments, give the prior probability P (morbidity = T) and parameters.

def PCT(p, key):
    rc = key['rc']
    fp = 1. - key['sp']
    return rc * p / ( rc * p + fp * (1. - p))

def PCF(p, key):
    sp = key['sp']
    fn = 1. - key['rc']
    return sp * (1. - p) / ( fn * p + sp * (1. - p))

def AC(p, key):
    rc = key['rc']
    sp = key['sp']
    return rc*p + sp*(1. - p)

This is the part to be calculated by changing the prior probability P (morbidity = T). I try to make the mesh finer near 0.

key = {'rc' : 0.7, 'sp' : 0.95 }
pp = [ np.exp( - 0.1 * i) for i in range(0,100)]
pct = [ PCT( p, key) for p in pp]
pcf = [ PCF( p, key) for p in pp]
ac = [ AC( p, key) for p in pp]

This is the part that displays the graph.

plt.rcParams["font.size"] = 12
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(pp, pct)
ax.plot(pp, pcf)
ax.plot(pp, ac)
ax.legend(['precision (infected)','precision (non-infected)','accuracy'])
xw = 0.1; xn = int(1./xw)+1
ax.set_xticks(np.linspace(0,xw*(xn-1), xn))
yw = 0.1; yn = int(1./yw)+1
ax.set_yticks(np.linspace(0,yw*(yn-1), yn))
ax.grid(which='both')
ax.set_xlabel('positive ratio (prior probability)')
plt.show()

simulation result

Let's take a look at the calculation result. PCR_PCAC_all.png

The following trends can be read from this graph.

Further consideration

Considering the significance of conducting a test, both the test positive compliance rate and the test negative compliance rate are important in determining quarantine, but in addition to that, the following indicators are considered to be important.

\begin{eqnarray}
Lie positive rate: FP&=& P(Affected=F|Inspection=T) = 1 - P(Affected=T|Inspection=T) = 1 - PC(T) \\
Lie Negative Rate: FN&=& P(Affected=T|Inspection=F) = 1 - P(Affected=F|Inspection=F) = 1 - PC(F) \\
\end{eqnarray}

Let's calculate and display these values.

fp = [1. - p for p in pct]
fn = [1. - p for p in pcf]

plt.rcParams["font.size"] = 12
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(pp, fp)
ax.plot(pp, fn)
ax.legend([ 'fake positive', 'fake negative' ])
xw = 0.1; xn = int(1./xw)+1
ax.set_xticks(np.linspace(0,xw*(xn-1), xn))
yw = 0.1; yn = int(1./yw)+1
ax.set_yticks(np.linspace(0,yw*(yn-1), yn))
ax.grid(which='both')
ax.set_xlabel('positive ratio (prior probability)')
plt.show()

Here is the result. PCR_FPFN_all.png

Obviously, the false positive rate has the opposite relationship with the test positive matching rate PC (T), and the false negative rate has the opposite relationship with the test negative matching rate PC (F). The following trends can be read from this graph.

Consideration

From the above, the following trends can be derived from the simulation regarding the PCR test for COVID-19 infection. Regarding the numerical values, it should be noted that the true values of sensitivity and specificity are only estimated values.

Furthermore ...

Reference link

I referred to the following page.

[Pathogen Detection Manual 2019-nCoV Ver.2.8] (https://www.niid.go.jp/niid/images/lab-manual/2019-nCoV20200304v2.pdf) [Guide for PCR experiments] (http://www.takara-bio.co.jp/kensa/pdfs/book_1.pdf) F value [Bayes' Theorem](https://en.wikipedia.org/wiki/%E3%83%99%E3%82%A4%E3%82%BA%E3%81%AE%E5%AE%9A%E7 % 90% 86)

Recommended Posts

Significance of narrowing down the test target of PCR test for new coronavirus understood by Bayes' theorem
Let's test the medical collapse hypothesis of the new coronavirus
Plot the spread of the new coronavirus
Let's calculate the transition of the basic reproduction number of the new coronavirus by prefecture
Pandas of the beginner, by the beginner, for the beginner [Python]
Estimate the peak infectivity of the new coronavirus
Factfulness of the new coronavirus seen in Splunk
GUI simulation of the new coronavirus (SEIR model)
Let's put out a ranking of the number of effective reproductions of the new coronavirus by prefecture