[PYTHON] The epidemic forecast of the new coronavirus was released on the Web at explosive speed

This article is also in the form of Article posted on Yotsuya Lab official blog.

If you like, please also check out Yotsuya Lab's blog.

Previously, there was an article posted by Mr. Yamashin. This is a very interesting article that estimates the parameters for drawing a SEIR model from the current changes in the number of infected people and predicts the number of cases in the future.

Estimate the number of people infected with the new coronavirus using a mathematical model

Background

While reading this article, I thought "Oh, it's amazing. I wish everyone could see it when they like it," and seriously thought about publishing it on the web. The script for predicting is written in Python. I had to set up Python on my PC to see the images. Also, I have to put in a library and prepare an environment for Python3 series ... Oh, it's annoying! !! !! It is a secret that became.

As a result, I made it. Click here for the services that are open to the public. From the following URL. Prediction of infection with new coronavirus COVID-19

To execute the script and get the latest data, you need to download the source CSV from Kaggle each time. Also, the update timing is different, so it's quite difficult to see it once a day. I was running the script each time I got it.

Actually, Mr. Yamashin, I also replaced the image of the blog article every day. Did you know ...?

I also wanted to see the latest data, but it was troublesome to build the environment. (Engineers are annoying lumps. They are demons. I only think about annoying things. First, I set up a Docker environment to run this script. Whether it's a home PC or a company PC ...

To see it, I was sshing from my commuting train to my VPS and hitting commands. But as expected, I'm getting tired or annoyed ...

That's why we have released the coronavirus epidemic forecast for anyone to see on the Web!

This time, I would like to talk about what I thought about publishing this and how to implement it.

Technology selection

From the story of the technology used. The purpose of this time is as follows.

  1. Get the latest data on a regular basis
  2. I want to automatically execute a calculation script
  3. I want an environment where I can view graphs or images on the Web at any time.

In order to meet these requirements, we decided to clear them one by one.

The repository is open to the public here.

https://github.com/428lab/new_coronavirus_infection

Since I'm using Python, it's django or Flask as well as the server side, right? I thought you there! !! That's a good decision! I also wanted to use it. But this time it's different! !!

Python scripts aren't working as well as I expected. No, I was doing my main job. The job is to output the json foil of the verification data. Then, in order to build this system, what is the order of automatic distribution, and how are we achieving this?

I will explain step by step.

  1. Use the Docker environment to unify the development environment
  2. Hit Kaggle's API to get the latest Covid-19.CSV (Bash)
  3. Set the country name and population of each country (Python)
  4. Output a JSON file formatted for easy use on the Web (Python)
  5. Read the JSON file and display it in Chart.js. (Nuxt)
  6. Upload to the server's public directory (Bash)
  7. Labor saving and automation of server operation cost (running Cron with VPS)

Actually, considering the maintenance cost, the functions are divided by each factor. It's not very complimented as a product, and it can be quite burdensome depending on the part, but we are actively selecting the technology that the people involved are accustomed to in order to prioritize disclosure.

Reasons for selecting each technology

I would like to talk a little about the reasons for selecting the technology. It was a system that was built so that it would not be tied up by setting multiple languages and environments. I think that the goodness of turning around when developing such a small-scale product can also be used as a material for projects.

Using Docker

I used Docker this time because I wanted to unify the development environment of each individual as much as possible. The article published the other day is Python, but depending on the development environment, it can be executed with Python commands, or it coexists with Python 3. In addition, it is difficult to select and build a library for each project like Node.js, and it is also required to build Pyenv etc.

However, I chose Docker because it can solve these environment-dependent problems and create a situation where my personal development environment is not polluted. By the way, I use Docker-compose on the assumption that it will be easier to maintain and the number of containers will increase.

Output to JSON file

As for the Python code, Mr. Yamashin was already mentioned in the previous article, so I will omit the explanation here. When I thought about this data output, I thought about how to efficiently pass the data to the WEB. If you output as much data as you can, the WEB side can leave it to you to decide whether to use it or not. This time, regarding the data scale, we have grouped the actual and predicted values by country, and at the same time, we have made it possible to record anyway, leaving the time at the time of output. Since the numerical values of the output data use the array that is inserted at the time of graph generation as it is, the implementation itself of this is very easy.

    def plot_bar(self, ax):
        width = 0.5
        #Dictionary type initialization
        self.graph["fact"] = dict()
        self.graph["fact"]["infected"] = dict()
        self.graph["fact"]["recovered"] = dict()
        self.graph["fact"]["deaths"] = dict()

        for day, infected, recovered, deaths in zip(self.timestamp, self.infected, self.recovered, self.deaths ):
            bottom = 0
            ax.bar(day, infected, width, bottom, color='red', label='Infectious')
            #Added to the output data of the graph
            self.graph["fact"]["infected"][day.strftime("%Y/%m/%d")] = infected
            bottom += infected
            ax.bar(day, recovered, width, bottom, color='blue', label='Recovered')
            #Added to the output data of the graph
            self.graph["fact"]["recovered"][day.strftime("%Y/%m/%d")] = recovered
            bottom += recovered
            ax.bar(day, deaths, width, bottom, color='black', label='Deaths')
            #Added to the output data of the graph
            self.graph["fact"]["deaths"][day.strftime("%Y/%m/%d")] = deaths
            bottom += deaths

        ax.set_ylabel('Confirmed infections',fontsize=20)
        handler, label = ax.get_legend_handles_labels()
        ax.legend(handler[0:3] , label[0:3], loc="upper left", borderaxespad=0. , fontsize=20)

        return 

    def plot_estimation(self, ax, estimatedParams):
        day = self.timestamp[0]
        day_list = []
        max = 0
        #Dictionary-type initialization (predicted infected person)
        self.graph["estimation"] = dict()

        estimated_value_list = []
        for estimated_value in self.estimate4plot(estimatedParams.x[0])[:,2]:
            if max < estimated_value:
                max = estimated_value
                peak = (day, estimated_value)

            day_list.append(day)
            estimated_value_list.append(estimated_value)
            day += datetime.timedelta(days=1) 
            if estimated_value < 0:
                break
        ax.annotate(peak[0].strftime('%Y/%m/%d') + ' ' + str(int(peak[1])), xy = peak, size = 20, color = "black")
        ax.plot(day_list, estimated_value_list, color='red', label="Estimation infection", linewidth=3.0)

        #Infected person prediction
        self.graph["estimation"]["infection"] = estimated_value_list

        day = self.timestamp[0]
        day_list = []
        estimated_value_list = []
        for estimated_value in self.estimate4plot(estimatedParams.x[0])[:,3]:
            day_list.append(day)
            estimated_value_list.append(estimated_value)
            day += datetime.timedelta(days=1) 
            if estimated_value < 0:
                break
        ax.plot(day_list, estimated_value_list, color='blue', label="Estimation recovered", linewidth=3.0)

        #Push the recovery prediction
        self.graph["estimation"]["recovered"] = estimated_value_list

        day = self.timestamp[0]
        day_list = []
        estimated_value_list = []
        for estimated_value in self.estimate4plot(estimatedParams.x[0])[:,4]:
            day_list.append(day)
            estimated_value_list.append(estimated_value)
            day += datetime.timedelta(days=1) 
            if estimated_value < 0:
                break
        ax.plot(day_list, estimated_value_list, color='black', label="Estimation deaths", linewidth=3.0)

        #Throw in the dead prediction
        self.graph["estimation"]["deaths"] = estimated_value_list

        ax.set_ylim(0,)

        handler, label = ax.get_legend_handles_labels()
        ax.legend(handler[0:6] , label[0:6], loc="upper right", borderaxespad=0. , fontsize=20)

        return


All you have to do now is output the dictionary type that you have just entered. The output JSON is implemented to be data first as shown below. It recognizes this as a Javascript object.

{
    "fact": {
        "infected": {
            "2020/01/22": 2,
            "2020/01/23": 1,
            "2020/01/24": 1,
        },
        "recovered": {
            "2020/01/22": 0,
            "2020/01/23": 0,
            "2020/01/24": 0,
        },
        "deaths": {
            "2020/01/22": 0,
            "2020/01/23": 0,
            "2020/01/24": 0,
        }
    },
    "estimation": {
        "infection": [
            1.0,
            1.001343690430807,
            1.057149907444775,
            1.1604873710493135,
            1.3082809855652382,
            1.500433415450257,
            1.7392685227686018,
            2.0292066531306356,
        ],
        "recovered": [
            0.0,
            0.02712800489579143,
            0.05505548053170182,
            0.0851617724612349,
            0.1186923604385037,
            0.15685109117056692,
            0.20087311338956979,
            0.2520859691391483,
        ],
        "deaths": [
            0.0,
            0.0021553841476101903,
            0.004374288136297892,
            0.0067663042324873765,
            0.009430388748244232,
            0.012462190151582021,
            0.015959843930437784,
            0.02002882643985976,
        ]
    }
}

Also, the output destination is saved in the backup folder with the date and time, while the main data is output to the Assets folder of Nuxt. On the Nuxt side, we are aiming for a way to make it so that you do not have to be aware of this output.

Language selection on the WEB front side

As I mentioned at the beginning, I am using Nuxt.js, a JavaScript framework, to build this website. There are several reasons for choosing it, but if you leave Vue.js as it is, you often have to do this as well, so this time it's a bit of a pass. And I made it to Nuxt.js.

Static distribution with Nuxt reduces the drawing cost on the user side and the load during server-side rendering. Because I'm scared of billing! !! !!

There is no specific technical element here, but I couldn't spend a lot of time on the design and layout, so I used Bootstrap to simplify it. In addition, we give priority to using images shared only on Facebook and Twitter, aiming for a mechanism that is as easy to share as possible.

(Actually, I have implemented it quite well around OGP.)

The line and bar graph are only changed in shades of color, but the accumulated results and forecasts are made as easy as possible to judge. Since the X-axis date becomes very fine, I decided to output the peak and max dates separately. It still feels flat, so it's a point of improvement.

Run on VPS or server with Bash

Basically, all file operations are scripted in Bash. There's no reason why it's so difficult, but with Bash it's easy to run in Cron, and you can hit commands like yarn generate. In short, this is all you need to do to release it. This time, the scale is small, and basically what we do is batch processing, so we use this procedure from acquisition to execution of the ZIP file.

It's worth remembering that Cron just hits some autorun sh, and all the actual execution is in a Shell script.

By the way, the VPS server is borrowed from Mixhost this time.

https://mixhost.jp/

Currently, Mixhost has a system to support content distribution related to coronavirus. By borrowing this service, we also borrow VPS free of charge for publication. I realized that the 6-core memory is very blessed with an environment of 8GB and SSD 500GB. Be fat! !! !!

https://mixhost.jp/news/432

Summary

With that, I was able to bring it to the public in two days after I started making it seriously. After that, I have repeated detailed modifications, but this time I have created a site without being particular about the design, giving priority to publication. If you want to elaborate on the design or touch the code on the Python side, please access the repository below.

https://github.com/428lab/new_coronavirus_infection

Recommended Posts

The epidemic forecast of the new coronavirus was released on the Web at explosive speed
Plot the spread of the new coronavirus
Estimate the peak infectivity of the new coronavirus
Folding @ Home on Linux Mint to contribute to the analysis of the new coronavirus
Summarize the titles of Hottentori at the end and look at the present on the Web
At the time of python update on ubuntu
Factfulness of the new coronavirus seen in Splunk
GUI simulation of the new coronavirus (SEIR model)
Until the web application is released on Sakura VPS
Tweet the triple forecast of the boat race on Twitter
Tasks at the start of a new python project
I analyzed tweets about the new coronavirus posted on Twitter
Build a web API server at explosive speed using hug
Image crawling summary performed at the speed of a second
Quantify the degree of self-restraint required to contain the new coronavirus