[PYTHON] Let's make a map of the new corona infection site [FastAPI / PostGIS / deck.gl (React)] (Data processing)

New Corona ... The spread of infection doesn't stop ...

As a fledgling GIS shop, I wanted to put some kind of data on a map and visualize it, but most of the publicly available data was a PDF, so I was afraid to collect it.

But! !! !! !! !! !!

Sites that provide excellent data by freely copying, quoting, and reprinting data for non-commercial purposes ([https://gis.jag-japan.com/covid19jp/](https://gis. I found jag-japan.com/covid19jp/)), so I would like to use this data to express it on a map! !! xf

This time, after studying, it has a slightly redundant configuration (such as launching an application server and DB server with docker-compose and distributing GeoJSON with API).

Data acquisition

Let's take a look at the data immediately!

Go to the site above and click on the csv link in the upper left.

スクリーンショット 2020-03-30 7.40.44.png

Then you can download the following csv.

COVID-19.csv


Through,Ministry of Health, Labor and Welfare NO,Asymptomatic pathogen carrier,Domestic,Charter flights,Age,sex,Fixed date,Date of onset,Consultation prefecture,Prefecture of residence,In the residential jurisdiction,Residential city,Key,Presentation,Prefectural case number,status,Remarks,Source,Source2,Source3,Number of people,Cumulative,The day before ratio,Total dead,Number of dischargesCumulative,Number of discharges,Number of cases,PCR検査実施Number of people,PCR検査The day before ratio,Profession_For correctness confirmation,Work place_For correctness confirmation,Hospital Pref,Residential Pref,Release,Gender,X,Y,Fixed dateYYYYMMDD,Consultation prefectureコード,Prefecture of residenceコード,Update date and time,Field2,Field4,Field5,Field6,Field7,Field8,Field9,Field10
1 ,1 ,,A-1,,30 ,male,1/15/2020,1/3/2020,Kanagawa Prefecture,Kanagawa Prefecture,,,Kanagawa Prefecture,Kanagawa Prefecture,1,Discharge,,https://www.mhlw.go.jp/stf/newpage_08906.html,https://www.pref.kanagawa.jp/docs/ga4/bukanshi/occurrence.html,,1 ,1 ,1 ,0 ,1 ,1 ,0 ,,,,,Kanagawa,Kanagawa,Kanagawa Prefecture,Male,139.642347,35.447504,2020/1/15,14,14,3/29/2020 18:50,,,,,,,,
2 ,2 ,,A-2,,40 ,male,1/24/2020,1/14/2020,Tokyo,People's Republic of China,,,People's Republic of China,Tokyo,1,Discharge,,https://www.mhlw.go.jp/stf/newpage_09079.html,https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/24/20.html,,1 ,2 ,1 ,0 ,,,2 ,,,,,Tokyo,China(Mainland),Tokyo Metropolitan Government,Male,116.409685,39.903832,2020/1/24,13,NA,,,,,,,,,

csv is the best.

The number of infected people = the number of records, and they provide quite detailed data, but I can't handle it, so let's focus on the data you want to use!

Create a virtual environment before processing data

Let's create a directory for work and a directory for data processing work for the time being! (The directory name (covid_sample this time) is arbitrary)

You can create the covid_sample directory and the script directory under it at once with the following command.

$mkdir -p covid_sample/script

This time, I just want to display the data on the map, so [" Through "," Age "," Gender "," Confirmation date "," Onset date "," Prefecture of consultation "," Prefecture of residence "," Let's take out only around X "," Y "]!

I don't want to pollute the environment of the main machine, so I create a virtual environment using pipenv etc. and edit it quickly using pandas.

I think the article here will be very helpful for how to use pipenv!

If you have pipenv installed, move the script directory with $ cd covid_sample / script and create a Pipfile like the one below.

Pipfile


[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
pandas = "==0.24.2"
requests = "*"

[requires]
python_version = "3.8"

The above file is a file that specifies how to create a virtual environment. In the above example, the Python version is 3.8 and pandas and requests are installed.

Create a virtual environment with the $ pipenv install command and enter the virtual environment with the $ pipenv shell.

Then, I think that a character like (script) will appear at the left end of the shell, but if this appears, it is in the virtual environment.

If you check the installed libraries with $ pip list in this state, you can see that pandas and requests and their dependencies are installed.

(script) hogehoge:script$pip list
Package         Version   
--------------- ----------
certifi         2019.11.28
chardet         3.0.4     
idna            2.9       
numpy           1.18.2    
pandas          0.24.2    
pip             20.0.2    
python-dateutil 2.8.1     
pytz            2019.3    
requests        2.23.0    
setuptools      46.1.3    
six             1.14.0    
urllib3         1.25.8    
wheel           0.34.2    

This completes the virtual environment creation!

Data processing

Let's create a Python script like the one below! I named it format.py!

This script will download and process csv at once!

format.py


import pandas as pd
from datetime import datetime as dt
import requests

#csv url
csv_url = "https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv"

try:
    #Specify url and http request with GET method
    r = requests.get(csv_url)
    # COVID-19.Save as csv
    with open("COVID-19.csv", 'wb') as f:
        f.write(r.content)
except requests.exceptions.RequestException as err:
    print(err)

#Specify the name of the column to be used
column_names = [
    "Through", "Age", "sex", "Fixed date", "Date of onset",
    "Consultation prefecture", "Prefecture of residence", "X", "Y",
]

#Change column name
changed_column_name = {
    "Through": "id",
    "Age": "age",
    "sex": "gender",
    "Fixed date": "fixed_data",
    "Date of onset": "onset_data",
    "Consultation prefecture": "consultation",
    "Prefecture of residence": "prefectures",
    "X": "lng",
    "Y": "lat",
}

#Specify the array format of the column name used in usecols
df = pd.read_csv('COVID-19.csv', usecols=column_names)
#Specify the column name to be changed in dictionary format
rename_df = df.rename(columns=changed_column_name)

#Create a list by deleting the specified string from the age column
rename_df["age"] = [string.strip(' ') for string in list(rename_df["age"])]
#Convert time to date type
rename_df["fixed_data"] = [dt.strptime(data_string, "%m/%d/%Y").date() for data_string in list(rename_df["fixed_data"])]
# onset_Replace NaN in data
rename_df.fillna({'onset_data': '1/1/0001'}, inplace=True)
rename_df["onset_data"] = [dt.strptime(data_string, "%m/%d/%Y").date() for data_string in list(rename_df["onset_data"])]

#Replace the specified value in the specified column
# inplace=Change the original df with True
rename_df.replace(
    {
        "age": {"unknown": "999", "0-10": "10"}
    }
    , inplace=True)

#Export to csv
rename_df.to_csv("../docker/postgis/custom_COVID-19.csv", index=False)

When this command is executed, csv is downloaded and stored in the same directory as the script, and then the necessary data is extracted and processed and discharged to the specified directory.

The details are described in the comments of the code, so if you look there, you can understand it!

For the csv discharge location (covid_sample / fastAPI / docker / postgis / custom_COVID-19.csv), specify the new directory created by building the docker environment explained in the item, and specify the same directory in the docker settings. Please be careful when executing the script!

Environment construction with docker-compose

Once you have finished processing the csv, press control + D once to exit the virtual environment and use cd ../ to return to the root directory (covid_sample).

Let's convert the data to GeoJSON, a geospatial information format that is easy to handle on the web!

Any method can be used, but in the future I want to use the data as an API, so let's register it in the DB and start the API server using the backend!

In that case, we will use Docker, which is very convenient for personal server startup!

We will proceed on the assumption that it is installed, so if you are using mac, please refer to Docker Compose --Installation etc. with Docker Please install docker-compose. (If you google for other OS, it will come out soon!)

Once created, create a directory for the API as follows!

$mkdir -p docker/fastapi
$mkdir -p docker/postgis/init
$mkdir docker/postgis/sql

The current directory structure should look like this!

covid_sample
├── docker
│   ├── fastapi
│   └── postgis
│       ├── init
│       └── sql
└── script
    ├── COVID-19.csv
    ├── Pipfile
    ├── Pipfile.lock
    └── format.py

In addition, there are quite a few files required to start docker, so they are described in a rough way below, so please copy and paste them! Lol

--Main configuration file for docker-compose: Set the container to be started and the connection setting between containers and the mount directory with the host machine.

docker/docker-compose.yml


version: '3.7'
services:
    fastapi:
#Container name
        container_name: fastapi
#Directory containing docker files to build
        build: fastapi
        volumes:
#Directory to mount
            - ./fastapi:/usr/src/app/
        ports:
#Host side port: Container side port
            - 8000:8000
        env_file:
#File to set in environment variable
            - fastapi/.env
        depends_on:
#Service to connect
            - postgis

    postgis:
        container_name: postgis
        build: postgis
        volumes:
            - covid_postgis_data:/var/lib/postgresql/data
#            down -Specify the file to be executed at the first startup including when there is no volume with v etc.
            - ./postgis/init:/docker-entrypoint-initdb.d
#Directory to mount
            - ./postgis:/home
        env_file: postgis/.env_db
        ports:
#The port on the host side batting with the local psql, so it seems better than 5432
            - 5433:5432

volumes:
    covid_postgis_data:

--Environment variable configuration file for fastapi container

docker/fastapi/.env


DATABASE_HOST=localhost
DATABASE_PORT=5433

--Dockerfile for starting fastapi container (docker / fastapi / Dockerfile)

docker/fastapi/Dockerfile


FROM python:3.8

RUN apt-get update -y --fix-missing \
    && apt-get install -y -q --no-install-recommends

# install
RUN pip install pipenv
ADD Pipfile Pipfile.lock /
RUN pipenv install --system

# add to application
ADD app.py /

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

--Python module to be installed in fastapi container (docker / fastapi / Pipfile)

docker/fastapi/Pipfile


[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
fastapi = "*"
uvicorn = "*"
psycopg2-binary = "*"

[requires]
python_version = "3.8.0"

--Application files for fastapi

docker/fastapi/app.py


from fastapi import FastAPI

app = FastAPI()

# `uvicorn app:app --reload`Start with
@app.get("/")
async def hello():
    return {"text": "hello"}

--postgis container environment variable configuration file

Specify any password for docker / postgis / .env_db.

docker/postgis/.env_db


#In postgres container, if you write in env, DB will be created automatically
POSTGRES_USER=covid_user
POSTGRES_PASSWORD=hogehoge
POSTGRES_DB=covid_db

--Dockerfile for starting postgis container (docker / postgis / Dockerfile)

docker/postgis/Dockerfile


FROM mdillon/postgis:9.6

# locale settings.
RUN localedef -i ja_JP -c -f UTF-8 -A /usr/share/locale/locale.alias ja_JP.UTF-8
ENV LANG ja_JP.UT

--A shell script loaded when the postgis container is started for the first time

docker/postgis/init/init.sh


#!/bin/sh

psql -U covid_user -d covid_db -f /home/sql/init.sql

psql -U covid_user -d covid_db -c "COPY covid_data FROM '/home/custom_COVID-19.csv' WITH CSV HEADER DELIMITER ',';"

psql -U covid_user -d covid_db -f /home/sql/create_geom.sql

--Sql file executed by the above shell script

docker/postgis/sql/init.sql


CREATE EXTENSION postgis;

create table covid_data
(
    id           smallint not null,
    age          int,
    gender       text,
    fixed_data   date,
    onset_data   date,
    consultation text,
    prefectures  text,
    lng          float,
    lat          float
);

--Sql file executed by the above shell script

docker/postgis/sql/create_geom.sql


alter table covid_data
	add geom geometry(point, 4326);

UPDATE covid_data SET geom = ST_GeogFromText('SRID=4326;POINT(' || lng || ' ' || lat || ')')::GEOMETRY;

Once you have the csv processed with the above files and scripts, build and start it with $ docker-compose up -d --build!

This time we are launching a container for FastAPI, a microweb framework for Python, and PostGIS, a geospatial extension for PostgreSQL.

When the container is started, csv data is registered in the DB, and init.sh automatically generates a geometry type geom column from that data and inputs the data.

At this stage you should be able to see the data by connecting to the DB in the postgis container by some means (such as pgadmin or $ psql -U covid_user -d covid_db -h localhost -p 5433).

Also, if you access localhost: 8000 from your browser and the display of{"text": "hello"}is returned, the docker-compose environment construction is complete!

Creating an API

Once you have an environment, let's create an API!

Specifically, edit docker / fastapi / app.py as follows!

docker/fastapi/app.py


from fastapi import FastAPI
import psycopg2
from starlette.middleware.cors import CORSMiddleware

app = FastAPI()

#Added for CORS settings
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"]
)

# `uvicorn app:app --reload`Start with
@app.get("/")
async def hello():
    #Connect to postgreSQL
    connection = psycopg2.connect(
        # host="localhost",
        host="postgis",
        user="covid_user",
        password="hogehoge",
        dbname="covid_db",
        port=5432
    )

    #Set the encoding of the client program (automatically convert from the DB character code)
    connection.set_client_encoding('utf-8')

    #Get cursor
    cursor = connection.cursor()

    #Generate geojson
    sql = """
    SELECT jsonb_build_object(
        'type',     'FeatureCollection',
        'features', jsonb_agg(features.feature)
    )
    FROM (
      SELECT jsonb_build_object(
        'type',       'Feature',
        'id',         id,
        'geometry',   ST_AsGeoJSON(geom)::jsonb,
        'properties', to_jsonb(inputs) - 'gid' - 'geom'
      ) AS feature
      FROM (SELECT * FROM covid_data) inputs) features;
    """

    #SQL execution
    cursor.execute(sql)

    #Output acquisition result
    results = cursor.fetchall()[0][0]

    #Close the cursor
    cursor.close()

    #Disconnect
    connection.close()

    return results

In this application, first of all, it is necessary to access the API with Ajax from React to acquire and display data, so we are setting for CORS (this time, as a trial, all hosts and methods are allowed, but Please stop as much as possible because it is dangerous).

If you do not set it, you will not be able to access the API from the JavaScript side (map display application), so be sure to set it! (For more information, try google with CORS!)

In the part of @ app.get ("/ "), it means that the processing when / is accessed by the GET method is written under it. Let's write the process here this time.

After that, access the DB using a module called psycopg2 for connecting to PostgreSQL from Python.

In the SQL part, the data is forcibly converted to GeoJSON by using the JSONB type function of PostgreSQL.

After rewriting, stop the containers with $ docker-compose down and restart with $ docker-compose up -d --build!

After booting is complete, try connecting to localhost: 8000 again.

If all goes well, you should get GeoJSON like this!

{
    "type": "FeatureCollection",
    "features": [
        {
            "id": 1,
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    139.642347,
                    35.447504
                ]
            },
            "properties": {
                "id": 1,
                "age": 30,
                "lat": 35.447504,
                "lng": 139.642347,
                "gender": "male",
                "fixed_data": "2020-01-15",
                "onset_data": "2020-01-03",
                "prefectures": "Kanagawa Prefecture",
                "consultation": "Kanagawa Prefecture"
            }
        },
        {
            "id": 2,
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    116.409685,
                    39.903832
                ]
            },
            "properties": {
                "id": 2,
                "age": 40,
                "lat": 39.903832,
                "lng": 116.409685,
                "gender": "male",
                "fixed_data": "2020-01-24",
                "onset_data": "2020-01-14",
                "prefectures": "People's Republic of China",
                "consultation": "Tokyo"
            }
        },
        ...
}

Now the data is ready! !!

Next time, let's display this data on the map!

→ Click here for more information: Create a new corona infection site map [FastAPI / PostGIS / deck.gl (React)](Data display)

Recommended Posts

Let's make a map of the new corona infection site [FastAPI / PostGIS / deck.gl (React)] (Data processing)
Let's make the analysis of the Titanic sinking data like that
Let's take a look at the feature map of YOLO v3
Let's make a multilingual site using flask-babel
How to make a Pelican site map
The story of blackjack A processing (python)
Plot the environmental concentration of organofluorine compounds on a map using open data
Create a BOT that displays the number of infected people in the new corona