[PYTHON] [Introduction to logarithmic graph] Predict the end time of each country from the logarithmic graph of infection number data ♬

This time, we will predict the timing of infection peaks in each country by logarithmic plotting.

What i did

・ A little theory ・ Forecast of each country ・ Code explanation ・ Additional theory ・ Forecast of each country II

・ A little theory

Last time, the following equation was derived.

I =  I_0\exp(\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t)\\

If you take the logarithm of both sides,

\ln I =  \ln I_0+\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t\\

Is obtained. That is, if you draw a graph with $ \ ln I $ on the vertical axis and time $ t $ on the horizontal axis, the slope will be

\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)\\

You can draw a straight line. Here, this slope depends on $ S $ and is the number of uninfected persons that decreases over time. And this slope can be rewritten as follows by using the effective reproduction number $ R $ introduced last time.

\gamma (R - 1)\\
R = R_0 \frac{S}{N}\\
R_0 = \frac{\beta}{\gamma}\\

Therefore, this slope becomes 0 at R = 1, that is, the number of reproductions = 0, and the number of infected persons does not increase any more, so that the infection peaks. This time, based on this theory, I would like to plot the data of each country in $ \ log I \ vs \ t $ and see the time of the peak infection in each country.

Also, from the slope of the rising edge, $ S \ fallingdotseq N $, and $ \ gamma (R_0 -1) $ can be obtained. Fitting the graph obtained with this as the initial value, it seems that $ \ gamma $, $ \ beta $, $ R_0 $ can be obtained.

・ Forecast of each country

・ Situation of Korea, South, Diamond Princess, Italy

It may seem distrustful that it contains Italy, but the results are as follows: logmulti_Italy_.png If you look at these three countries side by side, you can see that they are quite different. That is, if the x and y axes are translated appropriately, they are likely to overlap. In other words, you can see that there are areas with the same slope. In each graph, you can see ** intervals that increase 10 times in almost 10 days **. Yes, DP is likely to saturate the infection curve first, then South Korea, and then Italy. The graph below plots the slope. In other words, when it reaches 0, it means that it is saturated. So, you can see the above from this graph. When I talk about Italy, I've been told a lot, but I think it will reach its peak in about 10 days.

・ Situation in European countries

The same graph is displayed for Spain, Germany, Switzerland, United Kingdom, and the Netherlands other than Italy. logmulti_United Kingdom_.png When I write it again, the UK is a little behind, but the others draw similar curves. The vertical axis differs depending on each situation (population, etc.), but the slope changes in almost the same way. And, looking at the slope diagram, it seems that it will be saturated all together within about 2 weeks **.

・ A little disappointing country

The three countries of Japan, Iran, and Bahrain also have very similar curves in a sense. logmulti_Bahrain_.png Since the situation of rising is different, the y-axis is completely different, so I think the situation of the medical system is different. However, the inclination after standing up is quite similar, and despite the fact that it has been almost saturated in the last few days, Iran and Japan have increased again. Bahrain also has a negative slope, but it has returned to positive again. The gradients of these three countries are ** 10 times / 30 days, which is a gentle gradient that seems to be saturated at any time after it stabilizes. These countries are still infected and need to be contained with tension throughout society.

・ Rumored American situation

I tried to plot Sweden and US in the same way. logmulti_Sweden_.png

As you can see at a glance, the rise is similar. However, Sweden has a smaller slope from the middle. However, looking at the slope graph below, it's almost parallel to the x-axis these days, and it doesn't look like it's heading towards zero. In other words, it doesn't seem to saturate in the near future. However, in the United States, the data from the 58th to the 65th are available, and it seems that the slope is rapidly approaching 0. If you extend this, you can see that ** there is a possibility that the infection peaks in about 10 days **. It can be seen that the maximum gradient of both countries is about 25 times / 20 days, which is larger than the previous gradient.

・ Code explanation

Lib to be used and data reading are the same as last time. The death count data is also processed, but it is deleted because it is not used below.

import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import pandas as pd

#Read CSV data with pandas.
data = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
data_r = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

Below, the data processing function returns the processed infection number data and gradient data.

def data_city(city):
    #Process the data
    t_cases = 0
    t_recover = 0
    t_deaths = 0
    for i in range(0, len(data_r), 1):
        if (data_r.iloc[i][1] == city): #for country/region
        #if (data_r.iloc[i][0] == city):  #for province:/state  
            print(str(data_r.iloc[i][0]) + " of " + data_r.iloc[i][1])
            for day in range(4, len(data.columns), 1):            
                confirmed_r[day - 4] += data_r.iloc[i][day]
            t_recover += data_r.iloc[i][day]      
    for i in range(0, len(data), 1):
        if (data.iloc[i][1] == city): #for country/region
        #if (data.iloc[i][0] == city):  #for province:/state  
            print(str(data.iloc[i][0]) + " of " + data.iloc[i][1])
            for day in range(4, len(data.columns), 1):
                confirmed[day - 4] += data.iloc[i][day] -  confirmed_r[day - 4]
                diff_confirmed[day - 4] += confirmed[day-4] /  confirmed_r[day - 4]

    tl_confirmed = 0
    dlog_confirmed = [0] * (len(data.columns) - 4)
    dlog_confirmed[0]=np.log(confirmed[0])
    dlog_confirmed[1]=np.log(confirmed[1])-np.log(confirmed[0])
    for i in range(2, len(confirmed)-2, 1):
        if confirmed[i] > 0:
            dlog_confirmed[i]=(np.log(confirmed[i+1])-np.log(confirmed[i-1]))/2
        else:
            continue
    tl_confirmed = confirmed[len(confirmed)-1] + confirmed_r[len(confirmed)-1] + confirmed_d[len(confirmed)-1]
    t_cases = tl_confirmed
            
    return confirmed, dlog_confirmed       

The following is used to plot multiple countries on the same diagram. Enter the country you want to draw in city_list. Use the above data processing function to get the infection count data and its semi-logarithmic slope. Draw this. Therefore fig, (ax1, ax2) =. .. .. The definition of is defined outside the for statement. Finally, I will display them all together. I don't dare to define the color, but it is automatically selected.

city_list={ "Spain","Germany","Switzerland","United Kingdom", "Netherlands" }
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 4, 4*2))
for city in city_list:
    confirmed = [0] * (len(data.columns) - 4)
    confirmed_r = [0] * (len(data_r.columns) - 4)
    confirmed_d = [0] * (len(data_d.columns) - 4)
    diff_confirmed = [0] * (len(data.columns) - 4)
    days_from_22_Jan_20 = np.arange(0, len(data.columns) - 4, 1)
    days_from_22_Jan_20_ = np.arange(0, len(data.columns) - 4, 1)
    recovered_rate = [0] * (len(data_r.columns) - 4)
    deaths_rate = [0] * (len(data_d.columns) - 4)
    
    confirmed, dlog_confirmed = data_city(city)
    #matplotlib drawing
    lns1=ax1.semilogy(days_from_22_Jan_20, confirmed, ".-", label = str(city))
    lns4=ax2.plot(days_from_22_Jan_20_, dlog_confirmed, ".-",label = str(city))

ax1.set_xlabel("days from 22, Jan, 2020")
ax1.set_ylabel("casas ")
ax2.set_ylabel("slopes ")
ax1.legend(loc=0)
ax2.legend(loc=0)
ax1.grid()
ax2.grid()

plt.pause(1)
plt.savefig('./fig/logmulti_{}_.png'.format(city)) 
plt.close()

・ Additional theory

In fact, it can be imagined that the transition of the cure number curve will be similar to the infection number curve, although there is a delay in recovery. So, I tried to talk about that story below. First, substituting $ I $ into the third equation of the SIR model gives the following.

\frac{dr}{dt} =  \gamma I_0\exp(\gamma (R - 1)t) \\

Therefore, if you take the logarithm of both sides

\ln (\frac{dr}{dt}) =  \ln \gamma + \ln I_0 + \gamma (R - 1)t \\

Will be. That is, the time derivative (slope) of the recovery rate can be expected to have the same curve as the above equation, and y-intercepts differ by $ \ ln \ gamma $. However, this seems to be difficult to handle due to the large error, so I will change it a little.

That is, when R is integrated as a constant (although not analytically correct), the following equation is obtained.

r =  \frac{I_0}{(R-1)}\exp(\gamma (R - 1)t) + r_0\\

Therefore, if you take the logarithm of both sides

\ln r =  \ln I_0 -\ln (R-1) + \gamma (R - 1)t +\ln r_0

Can be written. Therefore, it can be expected that a curve that is almost parallel and has a different y-intercept of $ -ln (R-1) + \ ln r_0 $ can be obtained when compared with the above formula of $ \ ln I $.

・ Forecast of each country II

・ Situation of Wuhan

A beautiful curve was obtained as shown below. As expected, the infection and cure curves start in parallel and intersect after the peak infection. Along with that, the difference between the two curves ($ \ log (A / B) = \ log A- \ log B $) passes straight through $ 10 ^ 0 $. log_Hubei_.png

・ Situation in Italy

Both curves are parallel, and I concluded above that the number of infections is likely to peak in about 10 days, but after that, it seems that they do not intersect easily. After this, it is considered that they will approach and intersect, but the healing number data may be suspicious. log_Italy_.png

・ Spain on behalf of Europe

Both curves are steadily approaching, and after the peak number of infections, they intersect and the end is visible. log_Spain_.png

・ Situation in Japan

In Japan, the two curves are gradually approaching, and the difference is heading toward 0, but the difference has widened due to the recent increase in the number of infections. So, as with the above, the conclusion is that the spread of infection is continuing and needs to be suppressed. log_Japan_.png

Summary

・ Logarithmic plots of infection count data in each country were used to classify and predict the peak timing of each infection count. ・ The end time was predicted by plotting the number of infections and the number of cures at the same time.

・ I want to track the changing situation every day ・ Since the theory of the additional part is suspicious, I will organize it a little more and connect it to the parameter determination.

Recommended Posts

[Introduction to logarithmic graph] Predict the end time of each country from the logarithmic graph of infection number data ♬
[Introduction to matplotlib] Read the end time from COVID-19 data ♬
From the introduction of pyethapp to the execution of contract
An introduction to data analysis using Python-To increase the number of video views-
python beginners tried to predict the number of criminals
The story of copying data from S3 to Google's TeamDrive
[Introduction to infectious disease model] Looking at the logarithmic graph. .. .. One week from "the second wave"!
Learn accounting data and try to predict accounts from the content of the description when entering journals
[Python] Try to graph from the image of Ring Fit [OCR]
Seaborn basics for beginners ① Aggregate graph of the number of data (Countplot)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
I want to know the population of each country in the world.
Get the number of visits to each page with ReportingAPI + Cloud Functions
[Introduction to Data Scientists] Basics of scientific calculation, data processing, and how to use the graph drawing library ♬ Environment construction
Summary of vtkThreshold (updated from time to time)
I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out
Introduction to Scapy ① (From installation to execution of Scapy)
[Introduction to Data Scientists] Basics of Python ♬
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
Introduction to Statistical Modeling for Data Analysis Expanding the range of applications of GLM
[Verification] Does levelDB take time to register data when the amount of data increases? ??
[Introduction to infectious disease model] Looking at the logarithmic graph. .. .. It's the second wave! ??
Paste a link to the data point of the graph created by jupyterlab & matplotlib
[Introduction to Python] How to get the index of data with a for statement
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]
Summary of gcc options (updated from time to time)
Convert data with shape (number of data, 1) to (number of data,) with numpy.
Change the decimal point of logging from, to.
[Introduction to cx_Oracle] (5th) Handling of Japanese data
The transition of baseball as seen from the data
The story of moving from Pipenv to Poetry
I tried to predict the price of ETF
Let's examine the convergence time from the global trend of the effective reproduction number of the new coronavirus
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
[Python] Representing the number of complaints from life insurance companies in a bar graph
I just wanted to extract the data of the desired date and time with Django
How to count the number of occurrences of each element in the list in Python with weight
Check the processing time and the number of calls for each process in python (cProfile)
It's time to seriously think about the definition and skill set of data scientists
Introduction to AI creation with Python! Part 1 I tried to classify and predict what the numbers are from the handwritten number images.