[PYTHON] I tried to predict Covid-19 using Darts

Prophet developed by Facebook is easy to understand and I have been using it for a long time for forecasting time series data, but recently I have been able to use Prophet for sklearn-like and wrap other time series analysis methods as well. I found out that there is a library called Darts, so I tried to check its usability using the data of Covid-19, so I will explain it.

What is Darts

20200825003540.png https://github.com/unit8co/darts Darts is a library that Swiss companies regretted in June 2020. Deeplearning such as Prophet and LSTM, and statistical models such as ARIMA are all very convenient because they are libraries that can be handled by sklearn-based API. .. ..

Installation method

You can install it with pip.

pip install 'u8darts[all]'

You can also install it with pip install u8darts without adding [all], but then pytorch etc. when running LSTM did not seem to be installed and an error occurred. For the time being, it may not be necessary if you just want to check the usability.

Execution environment

OS: macOS ver11.1 CPU: core i5 Memory: 16GB python: 3.8.7 Darts: 0.5.0

Data preparation

I think I used the data of Covid-19 this time. Download the data on the number of positives by date with the download link on the Ministry of Health, Labor and Welfare website. https://www.mhlw.go.jp/content/pcr_positive_daily.csv While thinking that it is necessary to use the number of PCR tests to predict positive people, this time we will build a model that simply predicts the number of positive people in the future from the number of positive people in the past, which is a verification of Darts. ..

I tried using it

Library installation

import warnings
warnings.simplefilter('ignore') #Many warnings will be issued, so those who are interested should do it
import pandas as pd
import darts
from darts import TimeSeries #Darts data type conversion module
import matplotlib.pyplot as plt

Data reading

df = pd.read_csv('https://www.mhlw.go.jp/content/pcr_positive_daily.csv') #Data up to January 14 can be downloaded at the time of article creation

The contents of the data look like this. Delicious! Just one day less than a year! スクリーンショット 2021-01-16 15.03.47.png

Data type conversion.

Darts does the conversion from pandas DataFrame with the TimeSeries module. This time, we will try to predict after December 01, 2012. This area is quite helpful based on sklearn's API. Intuitively easy to understand.

ts = TimeSeries.from_dataframe(df, time_col='date', value_cols='Number of PCR positives(Single day)')
train, val = ts.split_after(pd.Timestamp('20201201'))

Creating a learning model

They have prepared a lot of learning models. Since the deep learning type learning model requires another effort to convert the data, the other models are executed with the for statement. I've written it many more times, but after all the sklearn base is easy. Just run it with fit and predict, which you don't know how many times you've done it.

#Import model
from darts.models import ExponentialSmoothing, NaiveSeasonal, NaiveDrift, Prophet, ARIMA
from darts.models import AutoARIMA, StandardRegressionModel, Theta, FFT

models = [ExponentialSmoothing(), 
          NaiveSeasonal(), 
          NaiveDrift(), 
          Prophet(daily_seasonality=True, yearly_seasonality=True), 
          Prophet(daily_seasonality=True, yearly_seasonality=True, weekly_seasonality=True),#Since the number of inspections varies depending on the day of the week, we have prepared a version to see the periodicity of the week.
          ARIMA(), 
          AutoARIMA(), 
          StandardRegressionModel(), 
          Theta(), 
          FFT()]

for model in models:
    print(model.__str__())
    try: #This is for avoidance because some models will cause an error when executed.
        model.fit(train) #How to sklearn
        prediction = model.predict(len(val))
        #Confirmation by visualization
        plt.figure(figsize=(12, 5))
        ts.split_after(pd.Timestamp('20201101')) [1].plot(label='actual', lw=1) #When displayed from the beginning, it was difficult to see the part that deviated from the important prediction result, so the plot from 20101011
        prediction.plot(label='forecast', lw=1)
        plt.legend()
        plt.xlabel('Day')
        plt.show()
    except Exception as e:
        print('error¥t :{}'.format(e))

Execution result

Exponetial smoothing exs.png Naive seasonal model ns.png Naive drift model ndm.png Prophet prop1.png Prophet(Weekly True) prop2.png ARIMA arima.png Auto-ARIMA autoarima.png Theta It was an error! You have to look at the official document. .. ..

FFT fft.png

The result seems to be that the ARIMA model and Exponetial smoothing can be learned well. Prophet's Weekly defaults to Auto, so the result was the same. Even if you look at the actual line, it has risen explosively since April. It is understandable that this was declared an emergency and converged.

I tried the deep learning model

Try LSTM. For the parameters, I used the parameters described in the article that I referred to. The article is at the bottom of the page.

Data processing

from darts.models import TCNModel, RNNModel
from darts.dataprocessing.transformers import Scaler
from darts.metrics import mape, r2_score
from darts.utils.missing_values import fill_missing_values

#Data preparation. Scaler is 0,It seems to be a sklearn wrapper that normalizes with 1.
scaler = Scaler()
train_tr = scaler.fit_transform(train)
val_tr = scaler.transform(val)
ts_tr = scaler.transform(ts)

LSTM It will take some time.

model = RNNModel(
    model='LSTM',
    output_length=1, #Number of output (= prediction) time steps
    hidden_size=25, #Number of hidden states in RNN
    n_rnn_layers=3, #Number of hidden layers of RNN
    input_length=12, # Number of previous time stamps taken into account.(?did not understand…)
    dropout=0.4,
    batch_size=16,
    n_epochs=400,
    optimizer_kwargs={'lr': 1e-3},
    log_tensorboard=True,
    random_state=42
)
model.fit(train_tr, val_training_series=val_tr, verbose=True)

Confirmation of execution result

prediction = model.predict(len(val))
fig = plt.figure(figsize=(12, 5))
ts_tr_after10 = ts_tr.drop_before(pd.Timestamp('20201001'))
ts_tr_after10.plot(label='actual')
prediction.plot(label='forecast', color='red')
plt.legend()

lstm.png The accuracy is subtle ...

Summary

Darts found it quite useful. I don't think there will be any in the future, but I will investigate whether the accuracy has deteriorated compared to the accuracy when using the original Prophet, and if there is no problem, I will use Darts. Since it is sklearn-like, I would like to search for hyperparameters and try hard. Also, it is natural, but the accuracy is not high because it was just executed with the default parameters without any ingenuity of data. This isn't the library's fault, but the one I skipped. Also, the backtesting system is well-developed, so I'd like to try that area as well.

greeting

This time I got a state of emergency and I couldn't go unless I stayed at home, so I wrote an article for the first time. Since I started doing it, I hope I can write it little by little in the future. I would be grateful if you could feel free to comment on the modified part of the code. I am going to use it as an example.

Reference article

https://blog.ikedaosushi.com/entry/2020/08/25/003557 https://qiita.com/hironey/items/d1d8a80c8329d5d46c16

Recommended Posts

I tried to predict Covid-19 using Darts
I tried using Azure Speech to Text.
I tried to classify text using TensorFlow
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using aiomysql
I tried using Summpy
I tried using Pipenv
I tried using matplotlib
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried to debug.
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried to predict next year with AI
I tried to synthesize WAV files using Pydub.
I tried to make a ○ ✕ game using TensorFlow
I tried to predict the price of ETF
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to learn PredNet
[I tried using Pythonista 3] Introduction
I tried to predict the J-League match (data analysis)
I tried using easydict (memo).
I tried to organize SVM.
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
I tried using Amazon Glacier
I tried to get an AMI using AWS Lambda
I tried to approximate the sin function using chainer
I tried to become an Ann Man using OpenCV
I tried using git inspector
I tried to reintroduce Linux
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried using magenta / TensorFlow
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to access Google Spread Sheets using Python
I tried using AWS Chalice
I tried to complement the knowledge graph using OpenKE
I tried to draw a configuration diagram using Diagrams
I tried to compress the image using machine learning
I tried using Slack emojinator
I tried to predict the victory or defeat of the Premier League using the Qore SDK
I tried to predict and submit Titanic survivors with Kaggle
I tried to simulate ad optimization using the bandit algorithm.