Introduction

The number of people infected with the new coronavirus infection (COVID-19) has spread from China to the world, and as of March 14, 2020, the number of infected people is 145,637 and the number of deaths is 5,436. However, looking at the situation in Japan, the number of infected people is 734, which is only 9.1% of 8086 in South Korea and 4.2% of 17,660 in Italy. It has been pointed out that the reason for this clear difference is that Japan is limiting the number of PCR tests. It is the theory. Therefore, how much are you actually squeezing? What is the significance of squeezing? Isn't another country over-examining and collapsing medical care? , Etc. are causing controversy. However, because there is not much quantitative discussion, it seems that about emotional theory is developed through TV and the Internet, which hinders correct understanding. Therefore, in this article, based on Bayes' theorem, ** I focused on the significance of narrowing down PCR tests **.

What is a PCR test?

First, let's take a quick look at what a PCR (Polymerase Chain Reaction) test is. National Institute of Infectious Diseases Manual and [Guide for PCR Testing of Takara Bio](http: // While referring to www.takara-bio.co.jp/kensa/pdfs/book_1.pdf) etc., the outline of the inspection is as follows. In short, it seems to be positive if the characteristic region of the virus is increased by decomposing the double strand of DNA, amplifying a specific site, and repeating the process, and that part can be visually observed by electrophoresis. It seems that if only the part that is really characteristic of the virus can be sufficiently amplified (2 ^ N times), it can be detected with high accuracy (accuracy will be discussed later). However, it seems that the detection accuracy may drop due to the following factors.

DNA is not sufficiently separated at the stage of heat denaturation.
Primers erroneously bind during the annealing stage.
At the stage of elongation, the function of DNA polymerase deteriorates.
By-products are produced.

Test sensitivity / specificity

Sensitivity and specificity are some of the words we often hear about testing, but let's sort out the relevant terms. This Wiki is well organized.

Sensitivity or recall: Percentage of positives when affected
Specificity: Percentage of negatives when not affected
Precision: Percentage of positives affected
Accuracy: Percentage of all affected and positive, not affected and negative

There are various theories about the PCR test of COVID-19 (SARS-CoV-2), but it seems that there are reports that the sensitivity is about 70% and the specificity is 90% or more. However, the sensitivity may be affected by the sample collection method (wiping the pharynx with a cotton swab) and the transportation environment, and the specificity may be affected by the PCR test process, so it should not be a definite value. Let's do it. In reality, it is physically impossible to test whether there is any virus in the entire human body, so there is probably no true value for sensitivity and specificity.

By the way, looking at the definition of sensitivity and specificity above, I think that many people think that they are referring to posterior probabilities and simultaneous probabilities. So let's define it again with a mathematical formula.

\begin{eqnarray}
Sensitivity: RC&=& P(Inspection=T|Affected=T) \\
Specificity: SP&=& P(Inspection=F|Affected=F) \\
Compliance rate: PC&=& P(Affected=T|Inspection=T) \\
Correct answer rate: AC&=& P(Inspection=T,Affected=T) + P(Inspection=F,Affected=F)\\
\end{eqnarray}

In addition, there are false positives and false negatives that are often heard, and these are defined as follows.

\begin{eqnarray}
False positive rate: FP&=& P(Inspection=T|Affected=F) = 1 -Specificity\\
False Negative Rate: FN&=& P(Inspection=F|Affected=T) = 1 -sensitivity
\end{eqnarray}

Bayes' theorem

[Bayes' Theorem](https://ja.wikipedia.org/wiki/%E3%83%99%E3%82%A4%E3%82%BA%E3%81%AE%E5%AE%9A%E7 % 90% 86) is a formula that expresses the relationship between prior and posterior probabilities. Even in the machine learning area, it often appears in Bayesian inference.

P(B|A)=\frac{P(A|B)P(B)}{P(A)}

Now, based on Bayes' theorem, we can see that ** precision can be calculated from sensitivity and specificity **. Here, let's assume that the test positive compliance rate is PC (T) and the test negative compliance rate is PC (F).

\begin{eqnarray}
PC(T) &=& P(Affected=T|Inspection=T) \\
&=&
 \frac{P(Inspection=T|Affected=T) P(Affected=T)}{P(Inspection=T)} \\
&=&
 \frac{P(Inspection=T|Affected=T) P(Affected=T)}
{ P(Inspection=T|Affected=T) P(Affected=T) + P(Inspection=T|Affected=F) P(Affected=F)} \\
&=&
 \frac{RC \times P(Affected=T)}
{ RC \times P(Affected=T) + (1 - SP) \times P(Affected=F)} \\
  \\
PC(F) &=& P(Affected=F|Inspection=F)\\
&=&
 \frac{P(Inspection=F|Affected=F) P(Affected=F)}{P(Inspection=F)} \\
&=&
 \frac{P(Inspection=F|Affected=F) P(Affected=F)}
{ P(Inspection=F|Affected=T) P(Affected=T) + P(Inspection=F|Affected=F) P(Affected=F)} \\
&=&
 \frac{SP \times P(Affected=T)}
{ (1-RC) \times P(Affected=T) + SP \times P(Affected=F)} \\
\end{eqnarray}

Also, the correct answer rate is the same.

\begin{eqnarray}
AC &=& P(Inspection=T,Affected=T) + P(Inspection=F,Affected=F) \\
&=&
 P(Inspection=T|Affected=T)P(Affected=T) + P(Inspection=F|Affected=F)P(Affected=F) \\
&=&
 RC \times P(Affected=T) + SP \times P(Affected=F)
\end{eqnarray}

Can be written.

Try to calculate with Python

Now, let's use Python to calculate the test-positive precision rate PC (T), test-negative precision rate PC (F), and correct answer rate (AC).

Prerequisites

As a prerequisite, use the following assumptions:

Recall rate (= sensitivity) RC is 0.7
Specificity SP is 0.95
Prior probabilities P (affected = T) and P (affected = F) are parameters.

As mentioned above, we do not know the true values of these, so it may be a good idea to change them in various ways and simulate them.

Import the library.

import numpy as np
import matplotlib.pyplot as plt

Define a function to calculate the test-positive match rate PC (T), test-negative match rate PC (F), and correct answer rate (AC). As arguments, give the prior probability P (morbidity = T) and parameters.

def PCT(p, key):
    rc = key['rc']
    fp = 1. - key['sp']
    return rc * p / ( rc * p + fp * (1. - p))

def PCF(p, key):
    sp = key['sp']
    fn = 1. - key['rc']
    return sp * (1. - p) / ( fn * p + sp * (1. - p))

def AC(p, key):
    rc = key['rc']
    sp = key['sp']
    return rc*p + sp*(1. - p)

This is the part to be calculated by changing the prior probability P (morbidity = T). I try to make the mesh finer near 0.

key = {'rc' : 0.7, 'sp' : 0.95 }
pp = [ np.exp( - 0.1 * i) for i in range(0,100)]
pct = [ PCT( p, key) for p in pp]
pcf = [ PCF( p, key) for p in pp]
ac = [ AC( p, key) for p in pp]

This is the part that displays the graph.

plt.rcParams["font.size"] = 12
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(pp, pct)
ax.plot(pp, pcf)
ax.plot(pp, ac)
ax.legend(['precision (infected)','precision (non-infected)','accuracy'])
xw = 0.1; xn = int(1./xw)+1
ax.set_xticks(np.linspace(0,xw*(xn-1), xn))
yw = 0.1; yn = int(1./yw)+1
ax.set_yticks(np.linspace(0,yw*(yn-1), yn))
ax.grid(which='both')
ax.set_xlabel('positive ratio (prior probability)')
plt.show()

simulation result

Let's take a look at the calculation result.

The following trends can be read from this graph.

Regarding PC (T), which is a test-positive match rate, it deteriorates considerably when the prior probability P (morbidity = T) is low, and when P (morbidity = T) = 0.1, the probability of being affected even if the test is positive is 60%. Degree.
Regarding the test positive compliance rate PC (F), the higher the prior probability P (morbidity = T), the worse the probability, and when P (morbidity = T) = 0.9, the probability of not being affected even if the test is negative is 25%. Degree.
Correct answer rate (AC) is a shape that linearly complements specificity and sensitivity, both of which are better than high values.

Further consideration

Considering the significance of conducting a test, both the test positive compliance rate and the test negative compliance rate are important in determining quarantine, but in addition to that, the following indicators are considered to be important.

Lie positive rate: Probability of not being affected even if the test is positive, P (affected = F | test = T): Because it is a law that must be isolated if the test is positive, the hospital bed is wasted. Because it fills in.
Lie negative rate: Probability of being affected even if the test is negative, P (affected = T | test = F): Because if the test is negative, it is possible to spread the infection to the surroundings without taking measures such as masks. Because there is sex.

False positive rate is P(Inspection=T|Affected=F), False negative rate is P(Inspection=F|Affected=T)Because it is defined as, I dare to P(Affected=F|Inspection=T)Lie positive rate, P(Affected=T|Inspection=F)Was used as the term lie negative rate. It is a coined word. The above indicators can be calculated below.

\begin{eqnarray}
Lie positive rate: FP&=& P(Affected=F|Inspection=T) = 1 - P(Affected=T|Inspection=T) = 1 - PC(T) \\
Lie Negative Rate: FN&=& P(Affected=T|Inspection=F) = 1 - P(Affected=F|Inspection=F) = 1 - PC(F) \\
\end{eqnarray}

Let's calculate and display these values.

fp = [1. - p for p in pct]
fn = [1. - p for p in pcf]

plt.rcParams["font.size"] = 12
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(pp, fp)
ax.plot(pp, fn)
ax.legend([ 'fake positive', 'fake negative' ])
xw = 0.1; xn = int(1./xw)+1
ax.set_xticks(np.linspace(0,xw*(xn-1), xn))
yw = 0.1; yn = int(1./yw)+1
ax.set_yticks(np.linspace(0,yw*(yn-1), yn))
ax.grid(which='both')
ax.set_xlabel('positive ratio (prior probability)')
plt.show()

Here is the result.

Obviously, the false positive rate has the opposite relationship with the test positive matching rate PC (T), and the false negative rate has the opposite relationship with the test negative matching rate PC (F). The following trends can be read from this graph.

Regarding the lie positive rate, it deteriorates considerably when the prior probability P (morbidity = T) is low, and when P (morbidity = T) = 0.1, the probability of not being affected even if the test is positive is about 40%.
Regarding the false negative rate, it deteriorates considerably when the prior probability P (morbidity = T) is high, and when P (morbidity = T) = 0.9, the probability of suffering even if the test is negative is about 75%.

Consideration

From the above, the following trends can be derived from the simulation regarding the PCR test for COVID-19 infection. Regarding the numerical values, it should be noted that the true values of sensitivity and specificity are only estimated values.

If the prior probability P (morbidity = T) is low, the test positive compliance rate and the false positive rate worsen, and generally $ P (morbidity = T) \ leq 0.1 $ is more harmful, and $ P (morbidity = T) If \ geq 0.2 $, a positive matching rate of about 80% can be expected.
If the prior probability P (morbidity = T) is high, the test negative compliance rate and the false negative rate worsen, and generally $ P (morbidity = T) \ geq 0.9 $ is more harmful, and $ P (morbidity = T) If \ leq 0.45 $, the negative matching rate can be expected to be about 80%.

Furthermore ...

The Ministry of Health, Labor and Welfare recommends PCR tests only when there are close contacts, fever and other symptoms, negative tests for influenza, etc., and the doctor deems it necessary. ** Prior probability P (affected) It is extremely rational ** in the sense that it increases = T).
Allowing anyone to undergo a PCR test for the desired 1 million people without prior screening is an act that makes the prior probability extremely low, and if the number of people who are really affected is 10,000. Below, if $ P (affected = T) \ leq 0.01 $, the false positive rate will be 87.5% or more, and it seems that medical resources will be wasted excessively.
It seems that not only South Korea and Germany but also the United States are trying to introduce drive-through inspection, but I am very worried.

Reference link

I referred to the following page.

[Pathogen Detection Manual 2019-nCoV Ver.2.8] (https://www.niid.go.jp/niid/images/lab-manual/2019-nCoV20200304v2.pdf) [Guide for PCR experiments] (http://www.takara-bio.co.jp/kensa/pdfs/book_1.pdf) F value [Bayes' Theorem](https://en.wikipedia.org/wiki/%E3%83%99%E3%82%A4%E3%82%BA%E3%81%AE%E5%AE%9A%E7 % 90% 86)

[PYTHON] Significance of narrowing down the test target of PCR test for new coronavirus understood by Bayes' theorem