title

I tried to predict the BTC Trade data obtained from BitMEX with LSTM using PyTorch, but the vanishing gradient problem? Frustrated by the NaN problem, etc. I regained my mind and tried backtesting with a separate trading strategy!

background

First of all, I learned Python in order to acquire the skills to develop systems or solutions using machine learning and deep learning. As a culmination of that, it is an article that summarizes the results of going through the flow of acquiring data on price fluctuations of virtual currencies, which is a hot topic these days, analyzing it, and verifying strategies.

Why PyTorch? Why BTC?

As I used the Deep Learning framework in the future, I felt that TnsorFlow was a little dead. However, since PyTorch has been talked about a lot recently and I have never touched it, I wanted to sit down and take a look. I chose BTC-related because I thought that technological innovation would progress in the future and because the API was substantial, I thought it would be easy to try various things and obtain data.

Conclusion, remaining items

Teacher coercion is difficult, and inference does not converge with the current model
A model can be created if the pattern is like a sine wave, but if it is a pattern that rises to the right these days, NaN that seems to be a "vanishing gradient problem" will occur.
Backtest results are not good with a simple LSTM model

flow

Bitmex data acquisition
Data preprocessing
LSTM model construction
Learning
Reasoning
Evaluation
Consideration
Build another LSTM model
Backtest
Required modules
Source code 1
Execution result log 1
Execution result log 2
Source code 2
Execution result log 3
References

BitMEX data acquisition

Recently, it seems that it has been completely down because it is not possible to trade from Japan, but since the API can be used, I used it. However, since the Tick data is placed at https://public.bitmex.com/?prefix=data/trade/, I take it as a file and use it.

Data preprocessing

VWAP calculation

Calculate Volume Weighted Average Price (VWAP) at 1-minute intervals Group transactions at predefined time intervals. Because the trading activity of the market changes with time. btc_prediction_by_lstm_pytorch.py1-df_vwap.plot.png

train/val/test data creation

Divide the volume weighted average price (VWAP) data into model training data 0.65, model evaluation data 0.08, and inference data 0.27.

scaling

Scale the data to make the LSTM model converge faster Larger input values can slow learning Use StandardScaler from sklearn library The scaler fits the training set and is used to transform invisible transaction data in the validation and test sets Fitting a scalar to all the data will overfit the model and give good results with this data, but slow performance with the actual data.

Time bar data conversion

After scaling, convert the data to a format suitable for modeling in LSTM Convert long sequences of data into many short sequences (100 timebars per sequence) that are shifted by a single timebar The plot below shows the first, second and third sequences of the training set. Both sequences are 100 time bars long The target of both sequences is about the same as the function, the difference is in the first and last timebars How do LSTMs use sequences during the training phase? First, focus on the first sequence The model takes the characteristics of the index 0 timebar and attempts to predict the target of the index 1 timebar. Next, we take the characteristics of the index 1 timebar and try to predict the target of the timebar. With index 2 etc. The features of the second sequence are shifted by one time bar from the features of the first sequence, and the features of the third sequence are shifted by one time bar from the features of the second sequence. This procedure yields many short sequences and shifts only a single timebar. Note that classification or regression tasks usually have a set of features and targets that you are trying to predict. In this example using LSTMs, the features and targets are from the same sequence, the only difference is that the targets are shifted by one timebar. btc_prediction_by_lstm_pytorch.py2-plot_sequence.png

Model building and learning

LSTM training Learn LSTM with 21 hidden units By reducing the number of credits, it is less likely that the LSTM will completely memorize it. Use mean squared error loss function and Adam optimizer for training Set the learning rate to 0.001 and attenuate every 5 epochs Learn 100 sequences per batch with 15 epochs From the plot, learning loss and verification loss converge at the 6th epoch btc_prediction_by_lstm_pytorch.py3-optimization_1.plot_losses.png

Model evaluation

Test set

Evaluate the model with a test set future parameter set to 5 Outputs VWAPs that the model considers to be in the next five time zones (5 minutes in this example) This makes it visible hours before the price change occurs. The plot shows that the predicted values are in close agreement with the actual VWAP values. However, the future parameter is set to 5, and the orange line does not cover the spikes, but must react before it occurs. btc_prediction_by_lstm_pytorch.py4-df_result_1.plot.png

Zoom in

If you zoom in on the spike (one at the start and one at the end and another time series), you can see that the predicted values mimic the actual values. When the actual value changes direction, the predicted value follows, but this is useless The same thing happens with more future parameters (does not affect the forecast line) btc_prediction_by_lstm_pytorch.py5-df_result_1.iloc.plot.png

Inference by model

Generate 1000 timebars for the first test sequence using the model and compare the predicted, generated and actual VWAP Observe that it is close to the actual value while the model outputs the predicted value But when you start generating values, the output is almost like a sine wave After a period of time, the value converges to 9600 This behavior is believed to occur because the model was trained only on the actual inputs and not on the generated inputs. When the model produces output from the generated input, it is not good at generating the next value Attempt to correct this problem by forcing a teacher btc_prediction_by_lstm_pytorch.py6-generate_sequence.png

Teacher compulsion

Teacher Force How to train a recurrent neural network that takes the output of the previous time step as input When training an RNN, you can generate a sequence by using the previous output as the current input. The same process can be done during training, but the model may become unstable or may not converge. Teacher coercion is an approach to address these issues during training Commonly used in language models

This time, we will use a Teacher forcing extension called Scheduled sampling. The model has a certain probability of using its generated output as input during training. Initially, the model is unlikely to see its generated output and gradually increases during training. Note that this example uses a random probability that does not increase during training

Train a model with teacher coercion enabled with the same parameters as before After 7 epochs, learning and verification losses converge btc_prediction_by_lstm_pytorch.py7-optimization_2.plot_losses.png

Model evaluation

Observe the same predicted sequence as before If you zoom in on the spikes, you can observe the behavior of the model as if the predicted values mimic the actual values. Teacher coercion did not solve the problem. .. .. btc_prediction_by_lstm_pytorch.py8-df_result_2.plot.png

Inference by model

Generate 1000 timebars for the first test sequence using a teacher-forced model btc_prediction_by_lstm_pytorch.py10-generate_sequence.png

Consideration

A consideration of the generated sequence is that the values generated from the teacher-forced model take a long time to converge. Second, when the sequence is increasing, it continues to increase to a certain point, then begins to decrease, and the pattern repeats until the sequence converges, which pattern is like a sine wave with decreasing amplitude. appear

Conclusion

Verification shows that the model's prediction mimics the actual value of the sequence. The first and second models do not detect price changes before they occur Adding another feature (like volume) may help detect price changes before the model occurs, but in that case the model will use those outputs as input in the next step. Two features need to be generated, which complicates the model As you can see in the plot above, the model has the ability to predict VWAP time series, so using a more complex model (using multiple LSTMCells and increasing the number of hidden units) does not help. May be It may be possible to improve the sequence generation skills of a model in a more sophisticated way of teacher coercion. .. ..

Build another LSTM model

Constant setting
Creation of teacher data
Price normalization
Data split, converted to Torch Tensor
LSTM learning model construction
First, replace the index of train data at random. Do not use the first time_steps.
Get the target index of perm_idx for each batch size
Preparation of time series data for LSTM input
Learning implementation of pytorch LSTM
Evaluation of validation data
Save model if validation evaluation is good
Predict with the best model.
Backtest with a simple strategy

Backtest

Score

Start                     2020-11-24 13:20:00
End                       2020-12-23 23:50:00
Duration                     29 days 10:30:00
Exposure Time [%]                     10.2358
Equity Final [$]                       110980
Equity Peak [$]                        111948
Return [%]                            10.9804
Buy & Hold Return [%]                 20.5915
Return (Ann.) [%]                     255.219
Volatility (Ann.) [%]                 52.4708
Sharpe Ratio                          4.86403
Sortino Ratio                         30.6017
Calmar Ratio                          127.727
Max. Drawdown [%]                    -1.99816
Avg. Drawdown [%]                   -0.526232
Max. Drawdown Duration        7 days 22:30:00
Avg. Drawdown Duration        0 days 16:50:00
# Trades                                   26
Win Rate [%]                          73.0769
Best Trade [%]                        1.00127
Worst Trade [%]                      -1.00668
Avg. Trade [%]                       0.453453
Max. Trade Duration           0 days 10:40:00
Avg. Trade Duration           0 days 02:37:00
Profit Factor                         2.69042
Expectancy [%]                       0.457399
SQN                                   2.53193
_strategy                    myCustomStrategy
_equity_curve                             ...
_trades                       Size  EntryB...

plot

Required modules (install the following modules with pip)

pip install numpy pandas matplotlib dateutil pprint sklearn torch skorch backtesting bitmex

Source code 1 (PyTorch's LSTM, code that challenged data scaling and teacher coercion)

`btc_prediction_by_lstm_pytorch.py`


# -*- coding: utf-8 -*-
'''
btc_prediction_by_lstm_pytorch.py

Copyright (C) 2020 HIROSE Ken-ichi ([email protected]) 
                                                 All rights reserved.
 This is free software with ABSOLUTELY NO WARRANTY.
 
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.
 
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
 You should have received a copy of the GNU General Public License
 along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
 02111-1307, USA
'''

import glob
import warnings

import os
import math
import time
import random
# import pprint
from dateutil import parser
from datetime import timedelta, datetime

import numpy as np
import pandas as pd
# import pandas_datareader.data as web

import matplotlib
import matplotlib.pyplot as plt

# import sklearn
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
# import skorch

from backtesting import Backtest, Strategy
from backtesting.lib import plot_heatmaps

# import bitmex

class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.lstm = nn.LSTMCell(self.input_size, self.hidden_size)
        self.linear = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, future=0, y=None):
        outputs = []

        # h_t = torch.zeros(input.size(0), self.hidden_size, dtype=torch.float32)
        h_t = torch.zeros(input.size(0), self.hidden_size, dtype=torch.float32, device=cuda_device)
        # c_t = torch.zeros(input.size(0), self.hidden_size, dtype=torch.float32)
        c_t = torch.zeros(input.size(0), self.hidden_size, dtype=torch.float32, device=cuda_device)

        for i, input_t in enumerate(input.chunk(input.size(1), dim=1)):
            h_t, c_t = self.lstm(input_t, (h_t, c_t))
            # print("c_t:{}".format(c_t)) #It has changed to NaN.
            # print("h_t:{}".format(h_t)) #It has changed to NaN.
            output = self.linear(h_t)
            # print("output:{}".format(output)) #It has changed to NaN.
            outputs += [output]
            # print("e-outputs:{}".format(outputs)) #It has changed to NaN.

        for i in range(future):
            if y is not None and random.random() > 0.5:
                output = y[:, [i]]  # teacher forcing
            h_t, c_t = self.lstm(output, (h_t, c_t))
            output = self.linear(h_t)
            outputs += [output]
        outputs = torch.stack(outputs, 1).squeeze(2)
        # print("outputs:{}".format(outputs)) #It has changed to NaN.
        return outputs

class Optimization:
    def __init__(self, model, loss_fn, optimizer, scheduler):
        self.model = model
        self.loss_fn = loss_fn
        self.optimizer = optimizer
        self.scheduler = scheduler
        self.train_losses = []
        self.val_losses = []
        self.futures = []

    @staticmethod
    def generate_batch_data(x, y, batch_size):
        for batch, i in enumerate(range(0, len(x) - batch_size, batch_size)):
            x_batch = x[i : i + batch_size]
            y_batch = y[i : i + batch_size]
            yield x_batch, y_batch, batch

    def train(
        self,
        x_train,
        y_train,
        x_val=None,
        y_val=None,
        batch_size=100,
        n_epochs=15,
        do_teacher_forcing=None,
    ):
        seq_len = x_train.shape[1]
        for epoch in range(n_epochs):
            startup = time.time()
            self.futures = []

            # with torch.autograd.detect_anomaly():
            train_loss = 0
            for x_batch, y_batch, batch in self.generate_batch_data(x_train, y_train, batch_size):
                y_pred = self._predict(x_batch, y_batch, seq_len, do_teacher_forcing)
                self.optimizer.zero_grad()
                loss = self.loss_fn(y_pred, y_batch)
                # print("tloss:{}".format(loss)) #It has changed to NaN.
                loss.backward()
                # nn.utils.clip_grad_norm_(self.model.parameters(), 0.25) # https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html
                self.optimizer.step()
                train_loss += loss.item()
            self.scheduler.step()
            train_loss /= batch
            self.train_losses.append(train_loss)
        
            self._validation(x_val, y_val, batch_size)
        
            elapsed = time.time() - startup
            print(
                "Epoch %d Train loss: %.2f. Validation loss: %.2f. Avg future: %.2f. Elapsed time: %.2fs."
                 % (epoch + 1, train_loss, self.val_losses[-1], np.average(self.futures), elapsed)
            )

    def _predict(self, x_batch, y_batch, seq_len, do_teacher_forcing):
        # print("x_batch:{}".format(x_batch)) 
        if do_teacher_forcing:
            future = random.randint(1, int(seq_len) / 2)
            limit = x_batch.size(1) - future
            y_pred = self.model(x_batch[:, :limit], future=future, y=y_batch[:, limit:])
            # print("if-y_pred:{}".format(y_pred)) 
        else:
            # print("x_batch:{}".format(x_batch)) #It has changed to NaN.
            future = 0
            y_pred = self.model(x_batch)
            # print("else-y_pred:{}".format(y_pred)) #It has changed to NaN.
        self.futures.append(future)
        return y_pred

    def _validation(self, x_val, y_val, batch_size):
        if x_val is None or y_val is None:
            return
        with torch.no_grad():
            val_loss = 0
            batch = 1
            for x_batch, y_batch, batch in self.generate_batch_data(x_val, y_val, batch_size):
                y_pred = self.model(x_batch)
                loss = self.loss_fn(y_pred, y_batch)
                # print("vloss:{}".format(loss)) #It has changed to NaN.
                val_loss += loss.item()
            val_loss /= batch
            self.val_losses.append(val_loss)

    def evaluate(self, x_test, y_test, batch_size, future=1):
        with torch.no_grad():
            test_loss = 0
            actual, predicted = [], []
            for x_batch, y_batch, batch in self.generate_batch_data(x_test, y_test, batch_size):
                y_pred = self.model(x_batch, future=future)
                y_pred = (
                    y_pred[:, -len(y_batch) :] if y_pred.shape[1] > y_batch.shape[1] else y_pred
                )
                loss = self.loss_fn(y_pred, y_batch)
                # print("eloss:{}".format(loss)) #It has changed to NaN.
                test_loss += loss.item()
                actual += torch.squeeze(y_batch[:, -1]).data.cpu().numpy().tolist()
                predicted += torch.squeeze(y_pred[:, -1]).data.cpu().numpy().tolist()
            test_loss /= batch
            return actual, predicted, test_loss

    def plot_losses(self):
        plt.plot(self.train_losses, lw=1, label="Training loss")
        plt.plot(self.val_losses, lw=1, label="Validation loss")
        plt.legend()
        plt.title("Losses")

def transform_data(arr, seq_len):
    x, y = [], []
    for i in range(len(arr) - seq_len):
        x_i = arr[i : i + seq_len]
        y_i = arr[i + 1 : i + seq_len + 1]
        x.append(x_i)
        y.append(y_i)
    x_arr = np.array(x).reshape(-1, seq_len)
    y_arr = np.array(y).reshape(-1, seq_len)
    x_var = Variable(torch.from_numpy(x_arr).float().to(cuda_device))
    y_var = Variable(torch.from_numpy(y_arr).float().to(cuda_device))
    return x_var, y_var

def plot_sequence(axes, i, x_train, y_train):
    axes[i].set_title("%d. Sequence" % (i + 1))
    axes[i].set_xlabel("Time bars")
    axes[i].set_ylabel("Scaled VWAP")
    axes[i].plot(range(seq_len), x_train[i].cpu().numpy(), color="r", lw=1, label="Feature")
    axes[i].plot(range(1, seq_len + 1), y_train[i].cpu().numpy(), color="b", lw=1, label="Target")
    axes[i].legend()

def generate_sequence(scaler, model, x_sample, future=1000):
    y_pred_tensor = model(x_sample, future=future)
    y_pred = y_pred_tensor.cpu().tolist()
    y_pred = scaler.inverse_transform(y_pred)
    return y_pred

def to_dataframe(actual, predicted):
    return pd.DataFrame({"actual": actual, "predicted": predicted})

def inverse_transform(scalar, df, columns):
    for col in columns:
        df[col] = scaler.inverse_transform(df[col])
    return df

def minutes_of_new_data(symbol, kline_size, data):
    if len(data) > 0:
        old = parser.parse(data["timestamp"].iloc[-1])
    else:
        old = bitmex_client.Trade.Trade_getBucketed(symbol=symbol, 
                binSize=kline_size, count=1, reverse=False).result()[0][0]['timestamp']
    new = bitmex_client.Trade.Trade_getBucketed(symbol=symbol, 
                binSize=kline_size, count=1, reverse=True).result()[0][0]['timestamp']
    return old, new

def get_all_bitmex(symbol, kline_size, save = False):
    filename = 'data/%s-%s-data.csv' % (symbol, kline_size)
    if os.path.isfile(filename):
        data_df = pd.read_csv(filename)
    else:
        data_df = pd.DataFrame()
    oldest_point, newest_point = minutes_of_new_data(symbol, kline_size, data_df)
    delta_min = (newest_point - oldest_point).total_seconds()/60
    available_data = math.ceil(delta_min/binsizes[kline_size])
    rounds = math.ceil(available_data / batch_size)
    if rounds > 0:
        for round_num in range(rounds):
            time.sleep(1)
            new_time = (oldest_point + timedelta(minutes = round_num * batch_size * binsizes[kline_size]))
            data = bitmex_client.Trade.Trade_getBucketed(symbol=symbol, 
                    binSize=kline_size, count=batch_size, startTime = new_time).result()[0]
            temp_df = pd.DataFrame(data)
            data_df = data_df.append(temp_df)
    data_df.set_index('timestamp', inplace=True)
    if save and rounds > 0:
        data_df.to_csv(filename)
    return data_df

if __name__ == '__main__':
    os.chdir(os.path.dirname(os.path.abspath(__file__)))
    ownprefix = os.path.basename(__file__)

    warnings.simplefilter('ignore')
    pd.set_option('display.max_columns', 100)
    np.set_printoptions(precision=3, suppress=True, formatter={'float': '{: 0.2f}'.format}) #Align digits

    start_time = time.perf_counter()
    print("start time: ", datetime.now().strftime("%H:%M:%S"))
    
    print("pandas==%s" % pd.__version__)
    print("numpy==%s" % np.__version__)
    print("torch==%s" % torch.__version__)
    print("matplotlib==%s" % matplotlib.__version__)
    
    cuda_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("cuda_device:",cuda_device)
    if cuda_device != "cpu": 
        print("devicde_name:",torch.cuda.get_device_name(torch.cuda.current_device()))
        torch.cuda.manual_seed(1)
    np.random.seed(1)
    random.seed(1)
    torch.manual_seed(1)
    
    if os.path.exists('{}.pickle'.format(ownprefix)):
        print("read_pickle:")
        df = pd.read_pickle('{}.pickle'.format(ownprefix))
    else:
        print("get_from_bitmex:")
        ## bitmex API
        # bitmex_api_key = ''    #Enter your own API-key here
        # bitmex_api_secret = '' #Enter your own API-secret here
        # binsizes = {"1m": 1, "5m": 5, "1h": 60, "1d": 1440}
        # batch_size = 750
        # bitmex_client = bitmex(test=False, api_key=bitmex_api_key, api_secret=bitmex_api_secret)
        # df = get_all_bitmex("XBTUSD","5m",save=True)
        ##
        # https://public.bitmex.com/?prefix=data/trade/
        files = sorted(glob.glob('./data/2019*.csv.gz'))
        # files = sorted(glob.glob('./data/2020*.csv.gz'))
        print("files:",files)
        df = pd.concat(map(pd.read_csv, files))
        df = df[df.symbol == 'XBTUSD']
        df.timestamp = pd.to_datetime(df.timestamp.str.replace('D', 'T'))
        df = df.sort_values('timestamp')
        df.set_index('timestamp', inplace=True)
        df.to_pickle('{}.pickle'.format(ownprefix))
        df.to_csv('{}.csv'.format(ownprefix))
    
    print("df.shape:",df.shape)
    print("df.tail:",df.tail(-5))
    

    '''
Volume-weighted average price at 1-minute intervals(VWAP)Calculate
Group transactions at predefined time intervals. Because the trading activity of the market changes with time.
    '''
    df_vwap = df.groupby(pd.Grouper(freq="1Min")).apply(
                            lambda row: pd.np.sum(row.price * row.foreignNotional) 
                                        / pd.np.sum(row.foreignNotional))
    print("df_vwap.shape:",df_vwap.shape)
    
    df_vwap.plot(figsize=(14, 7))    
    plt.show()
    plt.savefig('{}1-df_vwap.plot.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    '''
Volume weighted average price(VWAP)Data for model training 0.65, model evaluation data 0.08, inference data 0.Divide into 27
    '''
    train_len = round(len(df_vwap)*0.65)
    val_len = round(len(df_vwap)*0.08)
    
    df_train = df_vwap[:train_len].to_frame(name="vwap")
    print("df_train.shape:",df_train.shape)
    print("df_train.tail:",df_train.tail(-5))

    df_val = df_vwap[train_len:(train_len + val_len)].to_frame(name="vwap")
    print("df_val.shape:",df_val.shape)
    print("df_val.tail:",df_val.tail(-5))

    df_test = df_vwap[(train_len + val_len):].to_frame(name='vwap')
    print("df_test.shape:",df_test.shape)
    print("df_test.tail:",df_test.tail(-5))
    
    '''
Scale the data to make the LSTM model converge faster
Larger input values can slow learning
Use StandardScaler from sklearn library
The scaler fits the training set and is used to transform invisible transaction data in the validation and test sets
Fitting a scalar to all the data will overfit the model and give good results with this data, but slow performance with the actual data.
    '''
    scaler = StandardScaler()
    print("scaler type:",type(scaler),"\n",scaler)

    train_arr = scaler.fit_transform(df_train)
    print("train_arr.shape:",train_arr.shape,"train_arr type:",type(train_arr),"\n",train_arr)

    val_arr = scaler.transform(df_val)
    print("val_arr.shape:",val_arr.shape,"val_arr type:",type(val_arr),"\n",val_arr)

    test_arr = scaler.transform(df_test)
    print("test_arr.shape:",test_arr.shape,"test_arr type:",type(test_arr),"\n",test_arr)

    '''
After scaling, convert the data to a format suitable for modeling in LSTM
Convert long sequences of data into many short sequences (100 timebars per sequence) that are shifted by a single timebar
The plot below shows the first, second and third sequences of the training set.
Both sequences are 100 time bars long
The target of both sequences is about the same as the function, the difference is in the first and last timebars
How do LSTMs use sequences during the training phase?
First, focus on the first sequence
The model takes the characteristics of the index 0 timebar and attempts to predict the target of the index 1 timebar.
Next, we take the characteristics of the index 1 timebar and try to predict the target of the timebar.
With index 2 etc. The features of the second sequence are shifted by one time bar from the features of the first sequence, and the features of the third sequence are shifted by one time bar from the features of the second sequence.
This procedure yields many short sequences and shifts only a single timebar.
Note that classification or regression tasks usually have a set of features and targets that you are trying to predict.
In this example using LSTMs, the features and targets are from the same sequence, the only difference is that the targets are shifted by one timebar.
    '''
    seq_len = 100
    x_train, y_train = transform_data(train_arr, seq_len)
    print("x_train.shape:",x_train.shape,"x_train type:",type(x_train),"\n",x_train)
    print("y_train.shape:",y_train.shape,"y_train type:",type(y_train),"\n",y_train)

    x_val, y_val = transform_data(val_arr, seq_len)
    print("x_val.shape:",x_val.shape,"x_val type:",type(x_val),"\n",x_val)
    print("y_val.shape:",y_val.shape,"y_val type:",type(y_val),"\n",y_val)

    x_test, y_test = transform_data(test_arr, seq_len)
    print("x_test.shape:",x_test.shape,"x_test type:",type(x_test),"\n",x_test)
    print("y_test.shape:",y_test.shape,"y_test type:",type(y_test),"\n",y_test)
    
    fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(14, 7))
    plot_sequence(axes, 0, x_train, y_train)
    plot_sequence(axes, 1, x_train, y_train)    
    plot_sequence(axes, 2, x_train, y_train)    
    plt.show()
    plt.savefig('{}2-plot_sequence.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    '''
LSTM training
Learn LSTM with 21 hidden units
By reducing the number of credits, it is less likely that the LSTM will completely memorize it.
Use mean squared error loss function and Adam optimizer for training
Learning rate is 0.Set to 001 and attenuate every 5 epochs
Learn 100 sequences per batch with 15 epochs
From the plot, learning loss and verification loss converge at the 6th epoch
    '''
    # model_1 = Model(input_size=1, hidden_size=21, output_size=1)
    model_1 = Model(input_size=1, hidden_size=21, output_size=1).to(cuda_device)
    print("model_1 type:",type(model_1),"\n",model_1)

    loss_fn_1 = nn.MSELoss()
    # loss_fn_1 = nn.BCELoss()
    # loss_fn_1 = nn.BCEWithLogitsLoss()
    print("loss_fn_1 type:",type(loss_fn_1),"\n",loss_fn_1)

    optimizer_1 = optim.Adam(model_1.parameters(), lr=1e-4)
    print("optimizer_1 type:",type(optimizer_1),"\n",optimizer_1)

    scheduler_1 = optim.lr_scheduler.StepLR(optimizer_1, step_size=5, gamma=0.1)
    # scheduler_1 = torch.optim.lr_scheduler.MultiStepLR(optimizer_1, milestones=[2, 6], gamma=0.1)
    print("scheduler_1 type:",type(scheduler_1),"\n",scheduler_1)

    optimization_1 = Optimization(model_1, loss_fn_1, optimizer_1, scheduler_1)
    print("optimization_1 type:",type(optimization_1),"\n",optimization_1)
    
    optimization_1.train(x_train, y_train, x_val, y_val, do_teacher_forcing=False)
    print("optimization_1 type:",type(optimization_1),"\n",optimization_1)

    optimization_1.plot_losses()    
    plt.show()
    plt.savefig('{}3-optimization_1.plot_losses.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    '''
Evaluate the model with a test set
future parameter set to 5
Outputs VWAPs that the model considers to be in the next five time zones (5 minutes in this example)
This makes it visible hours before the price change occurs.
The plot shows that the predicted values are in close agreement with the actual VWAP values.
However, the future parameter is set to 5, and the orange line does not cover the spikes, but must react before it occurs.
    '''
    actual_1, predicted_1, test_loss_1 = optimization_1.evaluate(x_test, y_test, batch_size=100, future=5)
    print("Test loss %.4f" % test_loss_1)
    df_result_1 = to_dataframe(actual_1, predicted_1) 
    df_result_1 = inverse_transform(scaler, df_result_1, ['actual', 'predicted'])

    df_result_1.plot(figsize=(14*2, 7), lw=0.3)    
    plt.show()
    plt.savefig('{}4-df_result_1.plot.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()

    '''
If you zoom in on the spike (one at the start and one at the end and another time series), you can see that the predicted values mimic the actual values.
When the actual value changes direction, the predicted value follows, but this is useless
The same thing happens with more future parameters (does not affect the forecast line)
    '''
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 7))
    df_result_1.iloc[2350:2450].plot(ax=axes[0], figsize=(14, 7), lw=0.3)
    df_result_1.iloc[16000:17500].plot(ax=axes[1], figsize=(14, 7), lw=0.3)    
    plt.show()
    plt.savefig('{}5-df_result_1.iloc.plot.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()

    '''
Generate 1000 timebars for the first test sequence using the model and compare the predicted, generated and actual VWAP
Observe that it is close to the actual value while the model outputs the predicted value
But when you start generating values, the output is almost like a sine wave
After a period of time, the value converges to 9600
    '''
    x_sample = x_test[0].reshape(1, -1)
    y_sample = df_test.vwap[:1100]     
    y_pred1 = generate_sequence(scaler, optimization_1.model, x_sample)
    
    plt.figure(figsize=(14, 7))
    plt.plot(range(100), y_pred1[0][:100], color="blue", lw=1, label="Predicted VWAP")
    plt.plot(range(100, 1100), y_pred1[0][100:], "--", color="blue", lw=1, label="Generated VWAP")
    plt.plot(range(0, 1100), y_sample, color="red", lw=1, label="Actual VWAP")
    plt.legend()    
    plt.show()
    plt.savefig('{}6-generate_sequence.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()

    '''
This behavior is believed to occur because the model was trained only on the actual inputs and not on the generated inputs.
When the model produces output from the generated input, it is not good at generating the next value
Attempt to correct this problem by forcing a teacher
    
    [Teacher compulsion](https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/)Is
How to train a recurrent neural network that takes the output of the previous time step as input
When training an RNN, you can generate a sequence by using the previous output as the current input.
The same process can be done during training, but the model may become unstable or may not converge.
Teacher coercion is an approach to address these issues during training
Commonly used in language models
    
This time,[Scheduled sampling](https://arxiv.org/abs/1506.03099)Use a Teacher forcing extension called
The model has a certain probability of using its generated output as input during training.
Initially, the model is unlikely to see its generated output and gradually increases during training.
Note that this example uses a random probability that does not increase during training
    
Train a model with teacher coercion enabled with the same parameters as before
After 7 epochs, learning and verification losses converge
    '''
    # model_2 = Model(input_size=1, hidden_size=21, output_size=1)
    model_2 = Model(input_size=1, hidden_size=21, output_size=1).to(cuda_device)
    loss_fn_2 = nn.MSELoss()
    optimizer_2 = optim.Adam(model_2.parameters(), lr=1e-4)
    scheduler_2 = optim.lr_scheduler.StepLR(optimizer_2, step_size=5, gamma=0.1)
    optimization_2 = Optimization(model_2, loss_fn_2,  optimizer_2, scheduler_2)
    optimization_2.train(x_train, y_train, x_val, y_val, do_teacher_forcing=True)

    optimization_2.plot_losses()
    plt.show()
    plt.savefig('{}7-optimization_2.plot_losses.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    
    '''
Observe the same predicted sequence as before
If you zoom in on the spikes, you can observe the behavior of the model as if the predicted values mimic the actual values.
Teacher coercion did not solve the problem. .. ..
    '''
    actual_2, predicted_2, test_loss_2 = optimization_2.evaluate(x_test, y_test, batch_size=100, future=5)
    print("Test loss %.4f" % test_loss_2)
    df_result_2 = to_dataframe(actual_2, predicted_2)
    df_result_2 = inverse_transform(scaler, df_result_2, ["actual", "predicted"])

    df_result_2.plot(figsize=(14*2, 7), lw=0.3)
    plt.show()
    plt.savefig('{}8-df_result_2.plot.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()

    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 7))
    df_result_2.iloc[2350:2450].plot(ax=axes[0], figsize=(14, 7), lw=0.3)
    df_result_2.iloc[16000:17500].plot(ax=axes[1], figsize=(14, 7), lw=0.3)    
    plt.show()
    plt.savefig('{}9-df_result_2.iloc.plot.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    
    '''
Generate 1000 timebars for the first test sequence using a teacher-forced model
    '''
    y_pred2 = generate_sequence(scaler, optimization_2.model, x_sample)
    
    plt.figure(figsize=(14, 7))
    plt.plot(range(100), y_pred2[0][:100], color="blue", lw=1, label="Predicted VWAP")
    plt.plot(range(100, 1100), y_pred2[0][100:], "--", color="blue", lw=1, label="Generated VWAP")
    plt.plot(range(0, 1100), y_sample, color="red", lw=1, label="Actual VWAP")
    plt.legend()
    plt.show()
    plt.savefig('{}10-generate_sequence.png'.format(ownprefix), dpi=175, constrained_layout=True, tight_layout=True)
    plt.close()
    
    
    end_time = time.perf_counter()
    print("end time: ", datetime.now().strftime("%H:%M:%S"))
    
    time = end_time - start_time
    print("end_time - start_time:%f" % (time))
    
    '''
An interesting consideration of the generated sequence is that the values generated from the teacher-forced trained model take a long time to converge.
Another consideration is that when a sequence is increasing, it continues to increase to a certain point, then begins to decrease, and the pattern repeats until the sequence converges.
This pattern looks like a sine wave with reduced amplitude
    
    ##Conclusion
Verification shows that the model's prediction mimics the actual value of the sequence.
The first and second models do not detect price changes before they occur
Adding another feature (like volume) may help detect price changes before the model occurs,
In that case, the model would have to generate two features in order to use those outputs as inputs in the next step, complicating the model.
As you can see in the plot above, the model has the ability to predict the VWAP time series, so
Using a more complex model (using multiple LSTM Cells and increasing the number of hidden units) may not help
It may be possible to improve the sequence generation skills of a model in a more sophisticated way of teacher coercion. .. ..
    
    ##References
     - [Time series forecast](https://github.com/pytorch/examples/tree/master/time_sequence_prediction)
     - [Understand LSTM networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
     - [What is teacher coercion for recurrent neural networks](https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/)
     - [Scheduling for array prediction using recurrent neural networks](https://arxiv.org/abs/1506.03099)
    '''

Execution result log 1 (log when source code 1 reads 2019 BTC data)

$ python3 btc_prediction_by_lstm_pytorch.py
start time:  19:56:49
pandas==1.1.4
numpy==1.19.2
torch==1.5.0
matplotlib==3.3.3
cuda_device: cuda
devicde_name: GeForce RTX 2070
get_from_bitmex:
files: ['./data/20190801.csv.gz', './data/20190802.csv.gz', './data/20190803.csv.gz', './data/20190804.csv.gz', './data/20190805.csv.gz', './data/20190806.csv.gz', './data/20190807.csv.gz', './data/20190808.csv.gz', './data/20190809.csv.gz', './data/20190810.csv.gz', './data/20190811.csv.gz', './data/20190812.csv.gz', './data/20190813.csv.gz', './data/20190814.csv.gz', './data/20190815.csv.gz', './data/20190816.csv.gz', './data/20190817.csv.gz', './data/20190818.csv.gz', './data/20190819.csv.gz', './data/20190820.csv.gz', './data/20190821.csv.gz', './data/20190822.csv.gz', './data/20190823.csv.gz', './data/20190824.csv.gz', './data/20190825.csv.gz', './data/20190826.csv.gz', './data/20190827.csv.gz', './data/20190828.csv.gz', './data/20190829.csv.gz', './data/20190830.csv.gz', './data/20190831.csv.gz', './data/20190901.csv.gz', './data/20190902.csv.gz', './data/20190903.csv.gz', './data/20190904.csv.gz', './data/20190905.csv.gz', './data/20190906.csv.gz', './data/20190907.csv.gz', './data/20190908.csv.gz', './data/20190909.csv.gz', './data/20190910.csv.gz', './data/20190911.csv.gz', './data/20190912.csv.gz', './data/20190913.csv.gz', './data/20190914.csv.gz', './data/20190915.csv.gz', './data/20190916.csv.gz', './data/20190917.csv.gz']
df.shape: (36708098, 9)
df.tail:                             symbol  side   size    price  tickDirection  \
timestamp
2019-08-01 00:00:03.950526  XBTUSD   Buy     35  10089.0   ZeroPlusTick
2019-08-01 00:00:03.950526  XBTUSD   Buy     35  10089.0   ZeroPlusTick
2019-08-01 00:00:03.950526  XBTUSD   Buy     40  10089.0   ZeroPlusTick
2019-08-01 00:00:03.950526  XBTUSD   Buy   3117  10089.0   ZeroPlusTick
2019-08-01 00:00:03.956035  XBTUSD   Buy  18670  10089.0   ZeroPlusTick
...                            ...   ...    ...      ...            ...
2019-09-17 23:59:59.189310  XBTUSD  Sell   2000  10184.5  ZeroMinusTick
2019-09-17 23:59:59.189310  XBTUSD  Sell  15000  10184.5  ZeroMinusTick
2019-09-17 23:59:59.189310  XBTUSD  Sell  45383  10184.5  ZeroMinusTick
2019-09-17 23:59:59.517938  XBTUSD  Sell  10000  10184.5  ZeroMinusTick
2019-09-17 23:59:59.531223  XBTUSD  Sell      1  10184.5  ZeroMinusTick

                                                      trdMatchID  grossValue  \
timestamp
2019-08-01 00:00:03.950526  cec26c94-563c-7fcb-6194-d03ae2f41b92      346920
2019-08-01 00:00:03.950526  607b403e-4211-abda-8392-4003ad9f9ad0      346920
2019-08-01 00:00:03.950526  4d5cff30-eb11-43fe-caf4-1e150bc0bc18      396480
2019-08-01 00:00:03.950526  ab4806f1-2fac-6949-a5bc-07b4da4048f0    30895704
2019-08-01 00:00:03.956035  63205fcd-5165-3ac9-35d8-a39c632a5e60   185057040
...                                                          ...         ...
2019-09-17 23:59:59.189310  f2385097-5527-3075-3498-0b660dd39e4c    19638000
2019-09-17 23:59:59.189310  af7e0ceb-f670-b863-abb3-0feb76066711   147285000
2019-09-17 23:59:59.189310  b954d196-c63a-f86c-2599-cb86b409bbb6   445615677
2019-09-17 23:59:59.517938  69391f3f-65bc-c3bc-548d-963ffe5501df    98190000
2019-09-17 23:59:59.531223  6926bf37-72b7-3ac8-2c8a-3cdf22688ec0        9819

                            homeNotional  foreignNotional
timestamp
2019-08-01 00:00:03.950526      0.003469             35.0
2019-08-01 00:00:03.950526      0.003469             35.0
2019-08-01 00:00:03.950526      0.003965             40.0
2019-08-01 00:00:03.950526      0.308957           3117.0
2019-08-01 00:00:03.956035      1.850570          18670.0
...                                  ...              ...
2019-09-17 23:59:59.189310      0.196380           2000.0
2019-09-17 23:59:59.189310      1.472850          15000.0
2019-09-17 23:59:59.189310      4.456157          45383.0
2019-09-17 23:59:59.517938      0.981900          10000.0
2019-09-17 23:59:59.531223      0.000098              1.0

[36708093 rows x 9 columns]
df_vwap.shape: (69120,)
df_train.shape: (44928, 1)
df_train.tail:                              vwap
timestamp
2019-08-01 00:05:00  10108.710704
2019-08-01 00:06:00  10122.709866
2019-08-01 00:07:00  10122.340347
2019-08-01 00:08:00  10126.617881
2019-08-01 00:09:00  10147.380407
...                           ...
2019-09-01 04:43:00   9625.001311
2019-09-01 04:44:00   9625.045365
2019-09-01 04:45:00   9625.063765
2019-09-01 04:46:00   9626.437824
2019-09-01 04:47:00   9630.077496

[44923 rows x 1 columns]
df_val.shape: (5530, 1)
df_val.tail:                              vwap
timestamp
2019-09-01 04:53:00   9630.231584
2019-09-01 04:54:00   9629.291041
2019-09-01 04:55:00   9626.054305
2019-09-01 04:56:00   9626.067363
2019-09-01 04:57:00   9626.485287
...                           ...
2019-09-05 00:53:00  10545.065878
2019-09-05 00:54:00  10545.195416
2019-09-05 00:55:00  10542.311323
2019-09-05 00:56:00  10538.363399
2019-09-05 00:57:00  10537.195479

[5525 rows x 1 columns]
df_test.shape: (18662, 1)
df_test.tail:                              vwap
timestamp
2019-09-05 01:03:00  10522.959503
2019-09-05 01:04:00  10521.220404
2019-09-05 01:05:00  10517.852199
2019-09-05 01:06:00  10520.261375
2019-09-05 01:07:00  10520.428804
...                           ...
2019-09-17 23:55:00  10191.031001
2019-09-17 23:56:00  10194.615079
2019-09-17 23:57:00  10193.758451
2019-09-17 23:58:00  10187.193670
2019-09-17 23:59:00  10184.758720

[18657 rows x 1 columns]
scaler type: <class 'sklearn.preprocessing._data.StandardScaler'>
 StandardScaler()
train_arr.shape: (44928, 1) train_arr type: <class 'numpy.ndarray'>
 [[-0.71]
 [-0.69]
 [-0.71]
 ...
 [-1.37]
 [-1.36]
 [-1.36]]
val_arr.shape: (5530, 1) val_arr type: <class 'numpy.ndarray'>
 [[-1.36]
 [-1.36]
 [-1.36]
 ...
 [-0.09]
 [-0.10]
 [-0.10]]
test_arr.shape: (18662, 1) test_arr type: <class 'numpy.ndarray'>
 [[-0.10]
 [-0.10]
 [-0.11]
 ...
 [-0.58]
 [-0.59]
 [-0.59]]
x_train.shape: torch.Size([44828, 100]) x_train type: <class 'torch.Tensor'>
 tensor([[-0.7070, -0.6901, -0.7066,  ..., -0.8349, -0.8319, -0.8261],
        [-0.6901, -0.7066, -0.7005,  ..., -0.8319, -0.8261, -0.8263],
        [-0.7066, -0.7005, -0.6831,  ..., -0.8261, -0.8263, -0.8259],
        ...,
        [-1.3670, -1.3665, -1.3665,  ..., -1.3646, -1.3651, -1.3651],
        [-1.3665, -1.3665, -1.3640,  ..., -1.3651, -1.3651, -1.3651],
        [-1.3665, -1.3640, -1.3614,  ..., -1.3651, -1.3651, -1.3631]],
       device='cuda:0')
y_train.shape: torch.Size([44828, 100]) y_train type: <class 'torch.Tensor'>
 tensor([[-0.6901, -0.7066, -0.7005,  ..., -0.8319, -0.8261, -0.8263],
        [-0.7066, -0.7005, -0.6831,  ..., -0.8261, -0.8263, -0.8259],
        [-0.7005, -0.6831, -0.6940,  ..., -0.8263, -0.8259, -0.8228],
        ...,
        [-1.3665, -1.3665, -1.3640,  ..., -1.3651, -1.3651, -1.3651],
        [-1.3665, -1.3640, -1.3614,  ..., -1.3651, -1.3651, -1.3631],
        [-1.3640, -1.3614, -1.3612,  ..., -1.3651, -1.3631, -1.3581]],
       device='cuda:0')
x_val.shape: torch.Size([5430, 100]) x_val type: <class 'torch.Tensor'>
 tensor([[-1.3576, -1.3577, -1.3575,  ..., -1.3784, -1.3783, -1.3783],
        [-1.3577, -1.3575, -1.3579,  ..., -1.3783, -1.3783, -1.3783],
        [-1.3575, -1.3579, -1.3580,  ..., -1.3783, -1.3783, -1.3773],
        ...,
        [-0.0342, -0.0332, -0.0311,  ..., -0.0853, -0.0886, -0.0884],
        [-0.0332, -0.0311, -0.0231,  ..., -0.0886, -0.0884, -0.0924],
        [-0.0311, -0.0231, -0.0397,  ..., -0.0884, -0.0924, -0.0979]],
       device='cuda:0')
y_val.shape: torch.Size([5430, 100]) y_val type: <class 'torch.Tensor'>
 tensor([[-1.3577, -1.3575, -1.3579,  ..., -1.3783, -1.3783, -1.3783],
        [-1.3575, -1.3579, -1.3580,  ..., -1.3783, -1.3783, -1.3773],
        [-1.3579, -1.3580, -1.3579,  ..., -1.3783, -1.3773, -1.3774],
        ...,
        [-0.0332, -0.0311, -0.0231,  ..., -0.0886, -0.0884, -0.0924],
        [-0.0311, -0.0231, -0.0397,  ..., -0.0884, -0.0924, -0.0979],
        [-0.0231, -0.0397, -0.0419,  ..., -0.0924, -0.0979, -0.0995]],
       device='cuda:0')
x_test.shape: torch.Size([18562, 100]) x_test type: <class 'torch.Tensor'>
 tensor([[-0.1027, -0.1037, -0.1098,  ..., -0.0839, -0.0890, -0.0888],
        [-0.1037, -0.1098, -0.1119,  ..., -0.0890, -0.0888, -0.0886],
        [-0.1098, -0.1119, -0.1121,  ..., -0.0888, -0.0886, -0.0886],
        ...,
        [-0.5124, -0.5140, -0.5179,  ..., -0.5837, -0.5798, -0.5748],
        [-0.5140, -0.5179, -0.5202,  ..., -0.5798, -0.5748, -0.5760],
        [-0.5179, -0.5202, -0.5229,  ..., -0.5748, -0.5760, -0.5851]],
       device='cuda:0')
y_test.shape: torch.Size([18562, 100]) y_test type: <class 'torch.Tensor'>
 tensor([[-0.1037, -0.1098, -0.1119,  ..., -0.0890, -0.0888, -0.0886],
        [-0.1098, -0.1119, -0.1121,  ..., -0.0888, -0.0886, -0.0886],
        [-0.1119, -0.1121, -0.1192,  ..., -0.0886, -0.0886, -0.0886],
        ...,
        [-0.5140, -0.5179, -0.5202,  ..., -0.5798, -0.5748, -0.5760],
        [-0.5179, -0.5202, -0.5229,  ..., -0.5748, -0.5760, -0.5851],
        [-0.5202, -0.5229, -0.5224,  ..., -0.5760, -0.5851, -0.5885]],
       device='cuda:0')
model_1 type: <class '__main__.Model'>
 Model(
  (lstm): LSTMCell(1, 21)
  (linear): Linear(in_features=21, out_features=1, bias=True)
)
loss_fn_1 type: <class 'torch.nn.modules.loss.MSELoss'>
 MSELoss()
optimizer_1 type: <class 'torch.optim.adam.Adam'>
 Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0
)
scheduler_1 type: <class 'torch.optim.lr_scheduler.StepLR'>
 <torch.optim.lr_scheduler.StepLR object at 0x7fb3c80daac8>
optimization_1 type: <class '__main__.Optimization'>
 <__main__.Optimization object at 0x7fb3c80daa58>
Epoch 1 Train loss: 1.03. Validation loss: 0.71. Avg future: 0.00. Elapsed time: 14.63s.
Epoch 2 Train loss: 0.66. Validation loss: 0.22. Avg future: 0.00. Elapsed time: 14.95s.
Epoch 3 Train loss: 0.29. Validation loss: 0.16. Avg future: 0.00. Elapsed time: 14.95s.
Epoch 4 Train loss: 0.14. Validation loss: 0.08. Avg future: 0.00. Elapsed time: 14.70s.
Epoch 5 Train loss: 0.08. Validation loss: 0.05. Avg future: 0.00. Elapsed time: 14.71s.
Epoch 6 Train loss: 0.08. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.09s.
Epoch 7 Train loss: 0.06. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.07s.
Epoch 8 Train loss: 0.06. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 14.74s.
Epoch 9 Train loss: 0.06. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.76s.
Epoch 10 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.07s.
Epoch 11 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 14.75s.
Epoch 12 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 14.30s.
Epoch 13 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.01s.
Epoch 14 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.27s.
Epoch 15 Train loss: 0.05. Validation loss: 0.03. Avg future: 0.00. Elapsed time: 15.07s.
optimization_1 type: <class '__main__.Optimization'>
 <__main__.Optimization object at 0x7fb3c80daa58>
Test loss 0.0038
Epoch 1 Train loss: 0.89. Validation loss: 0.46. Avg future: 25.75. Elapsed time: 14.54s.
Epoch 2 Train loss: 0.51. Validation loss: 0.17. Avg future: 25.30. Elapsed time: 15.22s.
Epoch 3 Train loss: 0.15. Validation loss: 0.08. Avg future: 24.66. Elapsed time: 15.31s.
Epoch 4 Train loss: 0.08. Validation loss: 0.05. Avg future: 24.25. Elapsed time: 15.34s.
Epoch 5 Train loss: 0.06. Validation loss: 0.04. Avg future: 25.27. Elapsed time: 15.14s.
Epoch 6 Train loss: 0.07. Validation loss: 0.02. Avg future: 24.95. Elapsed time: 15.41s.
Epoch 7 Train loss: 0.06. Validation loss: 0.02. Avg future: 24.85. Elapsed time: 15.41s.
Epoch 8 Train loss: 0.05. Validation loss: 0.02. Avg future: 25.62. Elapsed time: 14.70s.
Epoch 9 Train loss: 0.05. Validation loss: 0.02. Avg future: 24.58. Elapsed time: 14.90s.
Epoch 10 Train loss: 0.05. Validation loss: 0.02. Avg future: 26.30. Elapsed time: 14.72s.
Epoch 11 Train loss: 0.05. Validation loss: 0.02. Avg future: 25.90. Elapsed time: 14.56s.
Epoch 12 Train loss: 0.05. Validation loss: 0.02. Avg future: 26.14. Elapsed time: 15.48s.
Epoch 13 Train loss: 0.04. Validation loss: 0.02. Avg future: 25.35. Elapsed time: 15.84s.
Epoch 14 Train loss: 0.04. Validation loss: 0.02. Avg future: 25.82. Elapsed time: 15.05s.
Epoch 15 Train loss: 0.04. Validation loss: 0.02. Avg future: 24.90. Elapsed time: 15.24s.
Test loss 0.0024
end time:  20:10:52
end_time - start_time:843.225909

Execution result log 2 (Log that causes Source Code 1 to read BTC data for 2020 and Loss becomes NaN)

$ python3 btc_prediction_by_lstm_pytorch.py
start time:  20:14:13
pandas==1.1.4
numpy==1.19.2
torch==1.5.0
matplotlib==3.3.3
cuda_device: cuda
devicde_name: GeForce RTX 2070
get_from_bitmex:
files: ['./data/20201031.csv.gz', './data/20201101.csv.gz', './data/20201102.csv.gz', './data/20201103.csv.gz', './data/20201104.csv.gz', './data/20201105.csv.gz', './data/20201106.csv.gz', './data/20201107.csv.gz', './data/20201108.csv.gz', './data/20201109.csv.gz', './data/20201110.csv.gz', './data/20201111.csv.gz', './data/20201112.csv.gz', './data/20201113.csv.gz', './data/20201114.csv.gz', './data/20201115.csv.gz', './data/20201116.csv.gz', './data/20201117.csv.gz', './data/20201118.csv.gz', './data/20201119.csv.gz', './data/20201120.csv.gz', './data/20201121.csv.gz', './data/20201122.csv.gz', './data/20201123.csv.gz', './data/20201124.csv.gz', './data/20201125.csv.gz', './data/20201126.csv.gz', './data/20201127.csv.gz', './data/20201128.csv.gz', './data/20201129.csv.gz', './data/20201130.csv.gz', './data/20201201.csv.gz', './data/20201202.csv.gz', './data/20201203.csv.gz', './data/20201204.csv.gz', './data/20201205.csv.gz', './data/20201206.csv.gz', './data/20201207.csv.gz', './data/20201208.csv.gz', './data/20201209.csv.gz', './data/20201210.csv.gz', './data/20201211.csv.gz', './data/20201212.csv.gz', './data/20201213.csv.gz', './data/20201214.csv.gz', './data/20201215.csv.gz', './data/20201216.csv.gz', './data/20201217.csv.gz', './data/20201218.csv.gz', './data/20201219.csv.gz', './data/20201220.csv.gz', './data/20201221.csv.gz', './data/20201222.csv.gz', './data/20201223.csv.gz']
df.shape: (18271747, 9)
df.tail:                             symbol  side   size    price tickDirection  \
timestamp
2020-10-31 00:00:02.626781  XBTUSD   Buy      1  13559.0      PlusTick
2020-10-31 00:00:02.748137  XBTUSD   Buy  24061  13560.5  ZeroPlusTick
2020-10-31 00:00:02.748137  XBTUSD   Buy   1414  13560.5  ZeroPlusTick
2020-10-31 00:00:02.748137  XBTUSD   Buy    150  13560.5  ZeroPlusTick
2020-10-31 00:00:02.748137  XBTUSD   Buy    100  13560.5      PlusTick
...                            ...   ...    ...      ...           ...
2020-12-23 23:59:58.571018  XBTUSD  Sell    420  23245.0     MinusTick
2020-12-23 23:59:58.580506  XBTUSD   Buy     13  23244.5     MinusTick
2020-12-23 23:59:58.593966  XBTUSD   Buy     10  23243.5     MinusTick
2020-12-23 23:59:58.597077  XBTUSD  Sell    447  23243.0     MinusTick
2020-12-23 23:59:58.646200  XBTUSD   Buy     11  23241.0     MinusTick

                                                      trdMatchID  grossValue  \
timestamp
2020-10-31 00:00:02.626781  eac8256e-bbe9-ad3c-9e63-17719295a974        7375
2020-10-31 00:00:02.748137  5581c2ae-b0ad-5121-858a-ce9c44d74943   177425814
2020-10-31 00:00:02.748137  e8a864fb-aa79-35e1-84af-8cbcaa127694    10426836
2020-10-31 00:00:02.748137  813afd63-06a0-6f79-f1b5-ee7643ed1eec     1106100
2020-10-31 00:00:02.748137  80a4d7f2-7a83-e7a9-4e9f-bb287c408b44      737400
...                                                          ...         ...
2020-12-23 23:59:58.571018  a12bbffc-d9d7-083c-cbd5-dbadb88cb0a0     1806840
2020-12-23 23:59:58.580506  f1e97f20-6739-4145-715a-fe0914393f20       55926
2020-12-23 23:59:58.593966  b57033c9-4dd2-9c96-04e5-7bf702e36806       43020
2020-12-23 23:59:58.597077  35a0a31a-4a9d-34e9-208f-5af533a576c5     1922994
2020-12-23 23:59:58.646200  b3cb9ba2-cebc-cdf5-528c-ce7c7b79269b       47333

                            homeNotional  foreignNotional
timestamp
2020-10-31 00:00:02.626781      0.000074              1.0
2020-10-31 00:00:02.748137      1.774258          24061.0
2020-10-31 00:00:02.748137      0.104268           1414.0
2020-10-31 00:00:02.748137      0.011061            150.0
2020-10-31 00:00:02.748137      0.007374            100.0
...                                  ...              ...
2020-12-23 23:59:58.571018      0.018068            420.0
2020-12-23 23:59:58.580506      0.000559             13.0
2020-12-23 23:59:58.593966      0.000430             10.0
2020-12-23 23:59:58.597077      0.019230            447.0
2020-12-23 23:59:58.646200      0.000473             11.0

[18271742 rows x 9 columns]
df_vwap.shape: (77760,)
df_train.shape: (50544, 1)
df_train.tail:                              vwap
timestamp
2020-10-31 00:05:00  13554.231586
2020-10-31 00:06:00  13557.871331
2020-10-31 00:07:00  13559.282798
2020-10-31 00:08:00  13557.649806
2020-10-31 00:09:00  13560.874972
...                           ...
2020-12-05 02:19:00  18741.511471
2020-12-05 02:20:00  18730.342480
2020-12-05 02:21:00  18724.025946
2020-12-05 02:22:00  18720.223295
2020-12-05 02:23:00  18721.599737

[50539 rows x 1 columns]
df_val.shape: (6221, 1)
df_val.tail:                              vwap
timestamp
2020-12-05 02:29:00  18745.846928
2020-12-05 02:30:00  18758.132585
2020-12-05 02:31:00  18765.631351
2020-12-05 02:32:00  18774.092713
2020-12-05 02:33:00  18788.076590
...                           ...
2020-12-09 10:00:00  18047.134980
2020-12-09 10:01:00  18026.450682
2020-12-09 10:02:00  18011.715896
2020-12-09 10:03:00  17995.496554
2020-12-09 10:04:00  18005.457240

[6216 rows x 1 columns]
df_test.shape: (20995, 1)
df_test.tail:                              vwap
timestamp
2020-12-09 10:10:00  17984.900355
2020-12-09 10:11:00  17991.922489
2020-12-09 10:12:00  17983.009223
2020-12-09 10:13:00  17979.464915
2020-12-09 10:14:00  17977.227506
...                           ...
2020-12-23 23:55:00  23248.030879
2020-12-23 23:56:00  23206.786654
2020-12-23 23:57:00  23264.388328
2020-12-23 23:58:00  23264.377437
2020-12-23 23:59:00  23268.652009

[20990 rows x 1 columns]
scaler type: <class 'sklearn.preprocessing._data.StandardScaler'>
 StandardScaler()
train_arr.shape: (50544, 1) train_arr type: <class 'numpy.ndarray'>
 [[-1.72]
 [-1.72]
 [-1.72]
 ...
 [ 1.06]
 [ 1.06]
 [ 1.06]]
val_arr.shape: (6221, 1) val_arr type: <class 'numpy.ndarray'>
 [[ 1.06]
 [ 1.07]
 [ 1.07]
 ...
 [ 0.67]
 [ 0.67]
 [ 0.67]]
test_arr.shape: (20995, 1) test_arr type: <class 'numpy.ndarray'>
 [[ 0.67]
 [ 0.66]
 [ 0.67]
 ...
 [ 3.51]
 [ 3.51]
 [ 3.51]]
x_train.shape: torch.Size([50444, 100]) x_train type: <class 'torch.Tensor'>
 tensor([[-1.7223, -1.7214, -1.7227,  ..., -1.6951, -1.6975, -1.6995],
        [-1.7214, -1.7227, -1.7272,  ..., -1.6975, -1.6995, -1.7009],
        [-1.7227, -1.7272, -1.7283,  ..., -1.6995, -1.7009, -1.6961],
        ...,
        [ 1.0484,  1.0509,  1.0464,  ...,  1.0668,  1.0681,  1.0621],
        [ 1.0509,  1.0464,  1.0463,  ...,  1.0681,  1.0621,  1.0587],
        [ 1.0464,  1.0463,  1.0454,  ...,  1.0621,  1.0587,  1.0566]],
       device='cuda:0')
y_train.shape: torch.Size([50444, 100]) y_train type: <class 'torch.Tensor'>
 tensor([[-1.7214, -1.7227, -1.7272,  ..., -1.6975, -1.6995, -1.7009],
        [-1.7227, -1.7272, -1.7283,  ..., -1.6995, -1.7009, -1.6961],
        [-1.7272, -1.7283, -1.7272,  ..., -1.7009, -1.6961, -1.6963],
        ...,
        [ 1.0509,  1.0464,  1.0463,  ...,  1.0681,  1.0621,  1.0587],
        [ 1.0464,  1.0463,  1.0454,  ...,  1.0621,  1.0587,  1.0566],
        [ 1.0463,  1.0454,  1.0409,  ...,  1.0587,  1.0566,  1.0574]],
       device='cuda:0')
x_val.shape: torch.Size([6121, 100]) x_val type: <class 'torch.Tensor'>
 tensor([[1.0619, 1.0737, 1.0740,  ..., 1.1188, 1.1161, 1.1076],
        [1.0737, 1.0740, 1.0740,  ..., 1.1161, 1.1076, 1.1094],
        [1.0740, 1.0740, 1.0698,  ..., 1.1076, 1.1094, 1.1188],
        ...,
        [0.5756, 0.5710, 0.5978,  ..., 0.6945, 0.6939, 0.6828],
        [0.5710, 0.5978, 0.5860,  ..., 0.6939, 0.6828, 0.6748],
        [0.5978, 0.5860, 0.5710,  ..., 0.6828, 0.6748, 0.6661]],
       device='cuda:0')
y_val.shape: torch.Size([6121, 100]) y_val type: <class 'torch.Tensor'>
 tensor([[1.0737, 1.0740, 1.0740,  ..., 1.1161, 1.1076, 1.1094],
        [1.0740, 1.0740, 1.0698,  ..., 1.1076, 1.1094, 1.1188],
        [1.0740, 1.0698, 1.0705,  ..., 1.1094, 1.1188, 1.1192],
        ...,
        [0.5710, 0.5978, 0.5860,  ..., 0.6939, 0.6828, 0.6748],
        [0.5978, 0.5860, 0.5710,  ..., 0.6828, 0.6748, 0.6661],
        [0.5860, 0.5710, 0.5541,  ..., 0.6748, 0.6661, 0.6715]],
       device='cuda:0')
x_test.shape: torch.Size([20895, 100]) x_test type: <class 'torch.Tensor'>
 tensor([[0.6657, 0.6632, 0.6724,  ..., 0.8106, 0.8074, 0.8068],
        [0.6632, 0.6724, 0.6659,  ..., 0.8074, 0.8068, 0.8062],
        [0.6724, 0.6659, 0.6675,  ..., 0.8068, 0.8062, 0.8041],
        ...,
        [3.3850, 3.3734, 3.3574,  ..., 3.5053, 3.4966, 3.4744],
        [3.3734, 3.3574, 3.3327,  ..., 3.4966, 3.4744, 3.5054],
        [3.3574, 3.3327, 3.2934,  ..., 3.4744, 3.5054, 3.5054]],
       device='cuda:0')
y_test.shape: torch.Size([20895, 100]) y_test type: <class 'torch.Tensor'>
 tensor([[0.6632, 0.6724, 0.6659,  ..., 0.8074, 0.8068, 0.8062],
        [0.6724, 0.6659, 0.6675,  ..., 0.8068, 0.8062, 0.8041],
        [0.6659, 0.6675, 0.6604,  ..., 0.8062, 0.8041, 0.8065],
        ...,
        [3.3734, 3.3574, 3.3327,  ..., 3.4966, 3.4744, 3.5054],
        [3.3574, 3.3327, 3.2934,  ..., 3.4744, 3.5054, 3.5054],
        [3.3327, 3.2934, 3.2572,  ..., 3.5054, 3.5054, 3.5077]],
       device='cuda:0')
model_1 type: <class '__main__.Model'>
 Model(
  (lstm): LSTMCell(1, 21)
  (linear): Linear(in_features=21, out_features=1, bias=True)
)
loss_fn_1 type: <class 'torch.nn.modules.loss.MSELoss'>
 MSELoss()
optimizer_1 type: <class 'torch.optim.adam.Adam'>
 Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0
)
scheduler_1 type: <class 'torch.optim.lr_scheduler.StepLR'>
 <torch.optim.lr_scheduler.StepLR object at 0x7fae7cf75828>
optimization_1 type: <class '__main__.Optimization'>
 <__main__.Optimization object at 0x7faead82ebe0>
Epoch 1 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.00s.
Epoch 2 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.02s.
Epoch 3 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.04s.
Epoch 4 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.03s.
Epoch 5 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.00s.
Epoch 6 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.75s.
Epoch 7 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.87s.
Epoch 8 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.92s.
Epoch 9 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.73s.
Epoch 10 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.97s.
Epoch 11 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.93s.
Epoch 12 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.84s.
Epoch 13 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.92s.
Epoch 14 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 18.05s.
Epoch 15 Train loss: nan. Validation loss: nan. Avg future: 0.00. Elapsed time: 17.68s.
optimization_1 type: <class '__main__.Optimization'>
 <__main__.Optimization object at 0x7faead82ebe0>
Test loss nan
Epoch 1 Train loss: nan. Validation loss: nan. Avg future: 25.69. Elapsed time: 18.89s.
Epoch 2 Train loss: nan. Validation loss: nan. Avg future: 25.42. Elapsed time: 19.03s.
Epoch 3 Train loss: nan. Validation loss: nan. Avg future: 23.72. Elapsed time: 18.80s.
Epoch 4 Train loss: nan. Validation loss: nan. Avg future: 24.97. Elapsed time: 18.47s.
Epoch 5 Train loss: nan. Validation loss: nan. Avg future: 25.43. Elapsed time: 18.58s.
Epoch 6 Train loss: nan. Validation loss: nan. Avg future: 24.85. Elapsed time: 18.69s.
Epoch 7 Train loss: nan. Validation loss: nan. Avg future: 25.38. Elapsed time: 18.40s.
Epoch 8 Train loss: nan. Validation loss: nan. Avg future: 24.75. Elapsed time: 18.42s.
Epoch 9 Train loss: nan. Validation loss: nan. Avg future: 26.19. Elapsed time: 18.63s.
Epoch 10 Train loss: nan. Validation loss: nan. Avg future: 25.97. Elapsed time: 18.22s.
Epoch 11 Train loss: nan. Validation loss: nan. Avg future: 25.67. Elapsed time: 17.62s.
Epoch 12 Train loss: nan. Validation loss: nan. Avg future: 25.30. Elapsed time: 17.29s.
Epoch 13 Train loss: nan. Validation loss: nan. Avg future: 26.18. Elapsed time: 16.94s.
Epoch 14 Train loss: nan. Validation loss: nan. Avg future: 25.29. Elapsed time: 17.39s.
Epoch 15 Train loss: nan. Validation loss: nan. Avg future: 24.92. Elapsed time: 16.77s.
Test loss nan
end time:  20:26:48
end_time - start_time:754.422876
$

Source code 2 (PyTorch's LSTM simple modeling, backtesting code)

`btc_prediction_and_backtest_by_pytorch.py`


# -*- coding: utf-8 -*-
'''
btc_prediction_and_backtest_by_pytorch.py

Copyright (C) 2020 HIROSE Ken-ichi ([email protected]) 
                                                 All rights reserved.
 This is free software with ABSOLUTELY NO WARRANTY.
 
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.
 
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
 You should have received a copy of the GNU General Public License
 along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
 02111-1307, USA
'''

import glob
import warnings

import os
# import math
import time
import random
# import pprint
# from dateutil import parser
from datetime import timedelta, datetime

import numpy as np
import pandas as pd
# import pandas_datareader.data as web

import matplotlib
import matplotlib.pyplot as plt

import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import skorch

from backtesting import Backtest, Strategy
from backtesting.lib import plot_heatmaps

# import bitmex

class LSTMClassifier(nn.Module):
    def __init__(self, lstm_input_dim, lstm_hidden_dim, target_dim):
        super(LSTMClassifier, self).__init__()
        self.input_dim = lstm_input_dim
        self.hidden_dim = lstm_hidden_dim
        self.lstm = nn.LSTM(input_size=lstm_input_dim, 
                            hidden_size=lstm_hidden_dim,
                            num_layers=1, #default
                            #dropout=0.2,
                            batch_first=True
                            )
        self.dense = nn.Linear(lstm_hidden_dim, target_dim)

    def forward(self, X_input):
        _, lstm_out = self.lstm(X_input)
        linear_out = self.dense(lstm_out[0].view(X_input.size(0), -1))
        return torch.sigmoid(linear_out)

def prep_feature_data(batch_idx, time_steps, X_data, feature_num, cuda_device):  
    feats = torch.zeros((len(batch_idx), time_steps, feature_num), dtype=torch.float, device=cuda_device)
    for b_i, b_idx in enumerate(batch_idx):
        b_slc = slice(b_idx + 1 - time_steps ,b_idx + 1) #Store the past N steps as time step data.
        feats[b_i, :, :] = X_data[b_slc, :]        
    return feats

def plot_losses(train_losses, val_losses):
    plt.plot(train_losses, lw=1, label="Training loss")
    plt.plot(val_losses, lw=1, label="Validation loss")
    plt.legend()
    plt.title("Losses")

def get_all_bitmex(symbol, kline_size, save = False):
    filename = 'data/%s-%s-data.csv' % (symbol, kline_size)
    if os.path.isfile(filename):
        data_df = pd.read_csv(filename)
    else:
        data_df = pd.DataFrame()
    oldest_point, newest_point = minutes_of_new_data(symbol, kline_size, data_df)
    delta_min = (newest_point - oldest_point).total_seconds()/60
    available_data = math.ceil(delta_min/binsizes[kline_size])
    rounds = math.ceil(available_data / batch_size)
    if rounds > 0:
        for round_num in range(rounds):
            time.sleep(1)
            new_time = (oldest_point + timedelta(minutes = round_num * batch_size * binsizes[kline_size]))
            data = bitmex_client.Trade.Trade_getBucketed(symbol=symbol, 
                    binSize=kline_size, count=batch_size, startTime = new_time).result()[0]
            temp_df = pd.DataFrame(data)
            data_df = data_df.append(temp_df)
    data_df.set_index('timestamp', inplace=True)
    if save and rounds > 0:
        data_df.to_csv(filename)
    return data_df

class myCustomStrategy(Strategy):
    def init(self):
        self.model = LSTMClassifier(feature_num, lstm_hidden_dim, target_dim).to(cuda_device) #Loading LSTM trained models
        self.model.load_state_dict(torch.load('{}.mdl'.format(ownprefix), map_location=torch.device(cuda_device))) # load model

    def next(self): 
        #Skip until the data for the past 500 steps is accumulated
        #Since we trade only once a day, hour&Process only when minute is 0.
        if len(self.data) >= moving_average_num + time_steps and len(self.data) % future_num == 0:
            # 2.Preparation of data for guessing
            x_array = self.prepare_data()
            x_tensor = torch.tensor(x_array, dtype=torch.float, device=cuda_device)
            # 3.Execution of forecast
            with torch.no_grad():
                y_pred = self.predict(x_tensor.view(1, time_steps, feature_num))

            # 4.Forecast buy(1)If buy(), Otherwise sell()
            if y_pred == 1:
                self.buy(sl=self.data.Close[-1]*0.99, 
                         tp=self.data.Close[-1]*1.01)
            else:
                self.sell(sl=self.data.Close[-1]*1.01, 
                         tp=self.data.Close[-1]*0.99)

    def prepare_data(self):
        #Once converted to a Pandas data frame
        tmp_df = pd.concat([
                    self.data.Open.to_series(), 
                    self.data.High.to_series(), 
                    self.data.Low.to_series(), 
                    self.data.Close.to_series(), 
                    self.data.Volume.to_series(), 
                    ], axis=1)

        #The ratio to the moving average of 500 pairs.
        cols = tmp_df.columns
        for col in cols:
            tmp_df['Roll_' + col] = tmp_df[col].rolling(window=moving_average_num, min_periods=moving_average_num).mean()
            tmp_df[col] = tmp_df[col] / tmp_df['Roll_' + col] - 1

        #Last time_Returns the value for steps only
        return tmp_df.tail(time_steps)[cols].values

    def predict(self, x_array):
        y_score = self.model(x_array) 
        return np.round(y_score.view(-1).to('cpu').numpy())[0]

class mySimpleStrategy(Strategy):
    def init(self):
        pass

    def next(self): 
        self.buy if self.data.Close[-1]> self.data.Open[-1] else self.sell()


if __name__ == '__main__':
    os.chdir(os.path.dirname(os.path.abspath(__file__)))
    ownprefix = os.path.basename(__file__)

    warnings.simplefilter('ignore')
    pd.set_option('display.max_columns', 100)
    np.set_printoptions(precision=3, suppress=True, formatter={'float': '{: 0.2f}'.format}) #Align digits

    start_time = time.perf_counter()
    print("start time: ", datetime.now().strftime("%H:%M:%S"))
    
    print("pandas==%s" % pd.__version__)
    print("numpy==%s" % np.__version__)
    print("torch==%s" % torch.__version__)
    print("matplotlib==%s" % matplotlib.__version__)
    
    cuda_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("cuda_device:",cuda_device)
    if cuda_device != "cpu": 
        print("devicde_name:",torch.cuda.get_device_name(torch.cuda.current_device()))
        torch.cuda.manual_seed(1)
    np.random.seed(1)
    random.seed(1)
    torch.manual_seed(1)
    
    if os.path.exists('{}.pickle'.format(ownprefix)):
        print("read_pickle:")
        df = pd.read_pickle('{}.pickle'.format(ownprefix))
    else:
        print("get_from_bitmex:")
        ## bitmex API
        # bitmex_api_key = ''    #Enter your own API-key here
        # bitmex_api_secret = '' #Enter your own API-secret here
        # binsizes = {"1m": 1, "5m": 5, "1h": 60, "1d": 1440}
        # batch_size = 750
        # bitmex_client = bitmex(test=False, api_key=bitmex_api_key, api_secret=bitmex_api_secret)
        # df = get_all_bitmex("XBTUSD","5m",save=True)
        ##
        # https://public.bitmex.com/?prefix=data/trade/
        # files = sorted(glob.glob('./data/2019*.csv.gz'))
        files = sorted(glob.glob('./data/2020*.csv.gz'))
        print("files:",files)
        df = pd.concat(map(pd.read_csv, files))
        df = df[df.symbol == 'XBTUSD']
        df.timestamp = pd.to_datetime(df.timestamp.str.replace('D', 'T'))
        df = df.sort_values('timestamp')
        df.set_index('timestamp', inplace=True)
        df.to_pickle('{}.pickle'.format(ownprefix))
        df.to_csv('{}.csv'.format(ownprefix))
    
    print("df.shape:",df.shape)
    print("df.tail:",df.tail(-5))

    # resample()Frequency code for H (hours), T (minutes), S (seconds), B (months)-Fri), W (week)
    df_ohlcv = df['price'].resample('10T', label='left', closed='left').ohlc().assign(
                                    volume=df['foreignNotional'].resample('10T').sum().values)
    df_ohlcv.rename(columns={'timestamp':'Datetime','open':'Open','high':'High',
                    'low':'Low','close':'Close','volume':'Volume'}, inplace=True)
    print("df_ohlcv.shape:",df_ohlcv.shape,"df_ohlcv type:",type(df_ohlcv),"\n",df_ohlcv)

    '''
    # 1.Constant setting
    '''
    future_num = 144 #How many feet to predict
    feature_num = 5 # open,high,low,close,5 items of volume
    batch_size = 64 # batch_size = 128
    time_steps = 50 #lstm timesteps
    moving_average_num = 500 #Number of Candles to take a moving average
    n_epocs = 30 
    
    lstm_hidden_dim = 16
    target_dim = 1

    '''
    # 2.Creation of teacher data
    '''
    future_price = df_ohlcv.iloc[future_num:]['Close'].values
    curr_price = df_ohlcv.iloc[:-future_num]['Close'].values
    y_data_tmp = future_price - curr_price
    print("future_price:",future_price,"\ncurr_price:",curr_price,"\ny_data_tmp:",y_data_tmp)

    y_data = np.zeros_like(y_data_tmp)
    y_data[y_data_tmp > 0] = 1
    y_data = y_data[moving_average_num:]
    print("y_data.shape:",y_data.shape,"y_data type:",type(y_data),"\n",y_data)

    '''
    # 3.Price normalization
    '''
    cols = df_ohlcv.columns # cols = df.columns
    for col in cols:
        df_ohlcv['Roll_' + col] = df_ohlcv[col].rolling(window=moving_average_num, min_periods=moving_average_num).mean()
        df_ohlcv[col] = df_ohlcv[col] / df_ohlcv['Roll_' + col] - 1    
    print("df_ohlcv.shape:",df_ohlcv.shape,"df_ohlcv type:",type(df_ohlcv))
    print("df_ohlcv.tail:",df_ohlcv.tail(-5))

    X_data = df_ohlcv.iloc[moving_average_num:-future_num][cols].values #The first 500 pairs are excluded because there is no moving average data. The latter 144 pairs are excluded because there is no forecast data.
    print("X_data.shape:",X_data.shape,"X_data type:",type(X_data),"\n",X_data)

    '''
    # 4.Data split, converted to Torch Tensor
    '''
    val_idx_from = round(len(df_ohlcv)*0.65) #Train data,Index to divide into test
    test_idx_from = round(len(df_ohlcv)*0.65) + round(len(df_ohlcv)*0.08)
    print("val_idx_from:",val_idx_from,"test_idx_from:",test_idx_from)
   
    X_train = torch.tensor(X_data[:val_idx_from], dtype=torch.float, device=cuda_device) #Training data
    y_train = torch.tensor(y_data[:val_idx_from], dtype=torch.float, device=cuda_device)
    print("X_train.shape:",X_train.shape,"X_train type:",type(X_train),"\n",X_train)
    print("y_train.shape:",y_train.shape,"y_train type:",type(y_train),"\n",y_train)
    
    X_val   = torch.tensor(X_data[val_idx_from:test_idx_from], dtype=torch.float, device=cuda_device) #Evaluation data
    y_val   = y_data[val_idx_from:test_idx_from]
    print("X_val.shape:",X_val.shape,"X_val type:",type(X_val),"\n",X_val)
    print("y_val.shape:",y_val.shape,"y_val type:",type(y_val),"\n",y_val)
    
    X_test  = torch.tensor(X_data[test_idx_from:], dtype=torch.float, device=cuda_device) #Test data
    y_test  = y_data[test_idx_from:]
    print("X_test.shape:",X_test.shape,"X_test type:",type(X_test),"\n",X_test)
    print("y_test.shape:",y_test.shape,"y_test type:",type(y_test),"\n",y_test)

    '''
    # 5.LSTM learning model construction
    '''
    model = LSTMClassifier(feature_num, lstm_hidden_dim, target_dim).to(cuda_device)
    print("model type:",type(model),"\n",model)

    loss_function = nn.BCELoss()
    print("loss_function type:",type(loss_function),"\n",loss_function)

    optimizer= optim.Adam(model.parameters(), lr=1e-4)
    print("optimizer type:",type(optimizer),"\n",optimizer)

    
    train_size = X_train.size(0)
    print("train_size:",train_size)

    best_acc_score = 0
    for epoch in range(n_epocs):
        '''
        # 1.First, the index of train data is randomly replaced. First time_Do not use steps.
        '''
        perm_idx = np.random.permutation(np.arange(time_steps, train_size))
        # print("perm_idx.shape:",perm_idx.shape,"perm_idx type:",type(perm_idx),"\n",perm_idx)
        '''
        # 2.perm for each batch size_Get the index of the target of idx
        '''
        for t_i in range(0, len(perm_idx), batch_size):
            batch_idx = perm_idx[t_i:(t_i + batch_size)]
            '''
            # 3.Preparation of time series data for LSTM input
            '''
            feats = prep_feature_data(batch_idx, time_steps, X_train, feature_num, cuda_device)
            y_target = y_train[batch_idx]
            '''
            # 4.Learning implementation of pytorch LSTM
            '''
            model.zero_grad()
            train_scores = model(feats) # batch size x time steps x feature_num
            loss = loss_function(train_scores, y_target.view(-1, 1))
            loss.backward()
            optimizer.step()
    
        '''
        # 5.Evaluation of validation data
        '''
        with torch.no_grad():
            feats_val = prep_feature_data(np.arange(time_steps, X_val.size(0)), time_steps, X_val, feature_num, cuda_device)
            val_scores = model(feats_val)
            tmp_scores = val_scores.view(-1).to('cpu').numpy()
            bi_scores = np.round(tmp_scores)
            acc_score = accuracy_score(y_val[time_steps:], bi_scores)
            roc_score = roc_auc_score(y_val[time_steps:], tmp_scores)
            print('EPOCH:',str(epoch),'loss:',loss.item(),'Val ACC Score:',acc_score,'ROC AUC Score:',roc_score)
    
        '''
        # 6.Save model if validation evaluation is good
        '''
        if acc_score > best_acc_score:
            best_acc_score = acc_score
            torch.save(model.state_dict(),'{}.mdl'.format(ownprefix))
            print('best score updated, Pytorch model was saved!!', )
    
    '''
    # 7.Predict with the best model.
    '''
    model.load_state_dict(torch.load('{}.mdl'.format(ownprefix)))
    with torch.no_grad():
        feats_test = prep_feature_data(np.arange(time_steps, X_test.size(0)), time_steps, X_test, feature_num, cuda_device)
        val_scores = model(feats_test)
        tmp_scores = val_scores.view(-1).to('cpu').numpy()
        bi_scores = np.round(tmp_scores)
        acc_score = accuracy_score(y_test[time_steps:], bi_scores)
        roc_score = roc_auc_score(y_test[time_steps:], tmp_scores)
        print('Test ACC Score:',acc_score,'ROC AUC Score:',roc_score)

    '''
    # 8.Backtest with a simple strategy
    '''
    # resample()Frequency code for H (hours), T (minutes), S (seconds), B (months)-Fri), W (week)
    df_ohlcv = df['price'].resample('10T', label='left', closed='left').ohlc().assign(
                                    volume=df['foreignNotional'].resample('10T').sum().values)
    df_ohlcv.rename(columns={'timestamp':'Datetime','open':'Open','high':'High',
                    'low':'Low','close':'Close','volume':'Volume'}, inplace=True)
    print("df_ohlcv.shape:",df_ohlcv.shape,"df_ohlcv type:",type(df_ohlcv),"\n",df_ohlcv)

    bt = Backtest(df_ohlcv[8000:], myCustomStrategy, cash=100000, commission=.00004)
    print(bt.run())
    bt.plot(filename='{}'.format(ownprefix), open_browser=False)

Execution result log 3 (Log in which source code 2 reads BTC data for 2020)

$ python3 btc_prediction_and_backtest_by_pytorch.py
start time:  23:52:58
pandas==1.1.4
numpy==1.19.2
torch==1.5.0
matplotlib==3.3.3
cuda_device: cuda
devicde_name: GeForce RTX 2070
get_from_bitmex:
files: ['./data/20200930.csv.gz', './data/20201001.csv.gz', './data/20201002.csv.gz', './data/20201003.csv.gz', './data/20201004.csv.gz', './data/20201005.csv.gz', './data/20201006.csv.gz', './data/20201007.csv.gz', './data/20201008.csv.gz', './data/20201009.csv.gz', './data/20201010.csv.gz', './data/20201011.csv.gz', './data/20201012.csv.gz', './data/20201013.csv.gz', './data/20201014.csv.gz', './data/20201015.csv.gz', './data/20201016.csv.gz', './data/20201017.csv.gz', './data/20201018.csv.gz', './data/20201019.csv.gz', './data/20201020.csv.gz', './data/20201021.csv.gz', './data/20201022.csv.gz', './data/20201023.csv.gz', './data/20201024.csv.gz', './data/20201025.csv.gz', './data/20201026.csv.gz', './data/20201027.csv.gz', './data/20201028.csv.gz', './data/20201029.csv.gz', './data/20201030.csv.gz', './data/20201031.csv.gz', './data/20201101.csv.gz', './data/20201102.csv.gz', './data/20201103.csv.gz', './data/20201104.csv.gz', './data/20201105.csv.gz', './data/20201106.csv.gz', './data/20201107.csv.gz', './data/20201108.csv.gz', './data/20201109.csv.gz', './data/20201110.csv.gz', './data/20201111.csv.gz', './data/20201112.csv.gz', './data/20201113.csv.gz', './data/20201114.csv.gz', './data/20201115.csv.gz', './data/20201116.csv.gz', './data/20201117.csv.gz', './data/20201118.csv.gz', './data/20201119.csv.gz', './data/20201120.csv.gz', './data/20201121.csv.gz', './data/20201122.csv.gz', './data/20201123.csv.gz', './data/20201124.csv.gz', './data/20201125.csv.gz', './data/20201126.csv.gz', './data/20201127.csv.gz', './data/20201128.csv.gz', './data/20201129.csv.gz', './data/20201130.csv.gz', './data/20201201.csv.gz', './data/20201202.csv.gz', './data/20201203.csv.gz', './data/20201204.csv.gz', './data/20201205.csv.gz', './data/20201206.csv.gz', './data/20201207.csv.gz', './data/20201208.csv.gz', './data/20201209.csv.gz', './data/20201210.csv.gz', './data/20201211.csv.gz', './data/20201212.csv.gz', './data/20201213.csv.gz', './data/20201214.csv.gz', './data/20201215.csv.gz', './data/20201216.csv.gz', './data/20201217.csv.gz', './data/20201218.csv.gz', './data/20201219.csv.gz', './data/20201220.csv.gz', './data/20201221.csv.gz', './data/20201222.csv.gz', './data/20201223.csv.gz']
df.shape: (25878209, 9)
df.tail:                             symbol  side   size    price tickDirection  \
timestamp
2020-09-30 00:00:02.771822  XBTUSD   Buy  12055  10839.5  ZeroPlusTick
2020-09-30 00:00:02.885748  XBTUSD  Sell   4500  10839.0     MinusTick
2020-09-30 00:00:02.989378  XBTUSD   Buy   3499  10839.5      PlusTick
2020-09-30 00:00:02.992595  XBTUSD   Buy     87  10839.5  ZeroPlusTick
2020-09-30 00:00:02.998145  XBTUSD   Buy   2383  10839.5  ZeroPlusTick
...                            ...   ...    ...      ...           ...
2020-12-23 23:59:58.571018  XBTUSD  Sell    420  23245.0     MinusTick
2020-12-23 23:59:58.580506  XBTUSD   Buy     13  23244.5     MinusTick
2020-12-23 23:59:58.593966  XBTUSD   Buy     10  23243.5     MinusTick
2020-12-23 23:59:58.597077  XBTUSD  Sell    447  23243.0     MinusTick
2020-12-23 23:59:58.646200  XBTUSD   Buy     11  23241.0     MinusTick

                                                      trdMatchID  grossValue  \
timestamp
2020-09-30 00:00:02.771822  ddc9e2a6-40b5-b5bf-715b-60cf18ab847a   111219430
2020-09-30 00:00:02.885748  938ba483-0bd9-b498-c5fb-162c2cc72acb    41517000
2020-09-30 00:00:02.989378  be1bab97-78a9-b0db-1ecd-be70eb7bdb99    32281774
2020-09-30 00:00:02.992595  29f68ab4-cc4f-291d-5cfb-a96baacac448      802662
2020-09-30 00:00:02.998145  be2e5e02-b1c8-88da-262a-1d23ffc62b32    21985558
...                                                          ...         ...
2020-12-23 23:59:58.571018  a12bbffc-d9d7-083c-cbd5-dbadb88cb0a0     1806840
2020-12-23 23:59:58.580506  f1e97f20-6739-4145-715a-fe0914393f20       55926
2020-12-23 23:59:58.593966  b57033c9-4dd2-9c96-04e5-7bf702e36806       43020
2020-12-23 23:59:58.597077  35a0a31a-4a9d-34e9-208f-5af533a576c5     1922994
2020-12-23 23:59:58.646200  b3cb9ba2-cebc-cdf5-528c-ce7c7b79269b       47333

                            homeNotional  foreignNotional
timestamp
2020-09-30 00:00:02.771822      1.112194          12055.0
2020-09-30 00:00:02.885748      0.415170           4500.0
2020-09-30 00:00:02.989378      0.322818           3499.0
2020-09-30 00:00:02.992595      0.008027             87.0
2020-09-30 00:00:02.998145      0.219856           2383.0
...                                  ...              ...
2020-12-23 23:59:58.571018      0.018068            420.0
2020-12-23 23:59:58.580506      0.000559             13.0
2020-12-23 23:59:58.593966      0.000430             10.0
2020-12-23 23:59:58.597077      0.019230            447.0
2020-12-23 23:59:58.646200      0.000473             11.0

[25878204 rows x 9 columns]
df_ohlcv.shape: (12240, 5) df_ohlcv type: <class 'pandas.core.frame.DataFrame'>
                         Open     High      Low    Close      Volume
timestamp
2020-09-30 00:00:00  10839.5  10842.0  10827.5  10828.0  13500945.0
2020-09-30 00:10:00  10828.5  10829.0  10822.0  10829.0   4477779.0
2020-09-30 00:20:00  10829.0  10829.0  10816.5  10819.5   3589041.0
2020-09-30 00:30:00  10819.5  10820.0  10814.5  10814.5   4523661.0
2020-09-30 00:40:00  10815.0  10820.0  10812.0  10820.0   3463389.0
...                      ...      ...      ...      ...         ...
2020-12-23 23:10:00  23307.0  23400.0  23264.0  23315.5  14089639.0
2020-12-23 23:20:00  23315.5  23485.5  23315.0  23382.0  40956253.0
2020-12-23 23:30:00  23382.0  23420.0  23333.0  23365.0  15243013.0
2020-12-23 23:40:00  23365.5  23376.0  23264.0  23281.0  13256178.0
2020-12-23 23:50:00  23281.5  23288.0  23190.0  23241.0  16298938.0

[12240 rows x 5 columns]
future_price: [ 10817.50  10799.50  10798.50 ...  23365.00  23281.00  23241.00]
curr_price: [ 10828.00  10829.00  10819.50 ...  23736.00  23757.50  23835.00]
y_data_tmp: [-10.50 -29.50 -21.00 ... -371.00 -476.50 -594.00]
y_data.shape: (11596,) y_data type: <class 'numpy.ndarray'>
 [ 1.00  1.00  1.00 ...  0.00  0.00  0.00]
df_ohlcv.shape: (12240, 10) df_ohlcv type: <class 'pandas.core.frame.DataFrame'>
df_ohlcv.tail:                          Open      High       Low     Close    Volume  \
timestamp
2020-09-30 00:50:00       NaN       NaN       NaN       NaN       NaN
2020-09-30 01:00:00       NaN       NaN       NaN       NaN       NaN
2020-09-30 01:10:00       NaN       NaN       NaN       NaN       NaN
2020-09-30 01:20:00       NaN       NaN       NaN       NaN       NaN
2020-09-30 01:30:00       NaN       NaN       NaN       NaN       NaN
...                       ...       ...       ...       ...       ...
2020-12-23 23:10:00 -0.002501 -0.000952 -0.001648 -0.002115 -0.256627
2020-12-23 23:20:00 -0.002115  0.002709  0.000556  0.000742  1.154715
2020-12-23 23:30:00  0.000742 -0.000075  0.001342  0.000026 -0.198753
2020-12-23 23:40:00  0.000048 -0.001942 -0.001603 -0.003553 -0.303641
2020-12-23 23:50:00 -0.003531 -0.005679 -0.004760 -0.005249 -0.144335

                     Roll_Open  Roll_High   Roll_Low  Roll_Close   Roll_Volume
timestamp
2020-09-30 00:50:00        NaN        NaN        NaN         NaN           NaN
2020-09-30 01:00:00        NaN        NaN        NaN         NaN           NaN
2020-09-30 01:10:00        NaN        NaN        NaN         NaN           NaN
2020-09-30 01:20:00        NaN        NaN        NaN         NaN           NaN
2020-09-30 01:30:00        NaN        NaN        NaN         NaN           NaN
...                        ...        ...        ...         ...           ...
2020-12-23 23:10:00  23365.435  23422.301  23302.411   23364.911  1.895367e+07
2020-12-23 23:20:00  23364.912  23422.039  23302.050   23364.672  1.900774e+07
2020-12-23 23:30:00  23364.673  23421.745  23301.725   23364.382  1.902411e+07
2020-12-23 23:40:00  23364.384  23421.476  23301.362   23364.003  1.903641e+07
2020-12-23 23:50:00  23364.007  23421.018  23300.902   23363.645  1.904826e+07

[12235 rows x 10 columns]
X_data.shape: (11596, 5) X_data type: <class 'numpy.ndarray'>
 [[-0.01 -0.01 -0.01 -0.01  0.11]
 [-0.01 -0.01 -0.01 -0.01 -0.54]
 [-0.01 -0.01 -0.01 -0.01 -0.69]
 ...
 [ 0.01  0.02  0.02  0.02  0.02]
 [ 0.02  0.02  0.02  0.02 -0.51]
 [ 0.02  0.02  0.02  0.02  0.18]]
val_idx_from: 7956 test_idx_from: 8935
X_train.shape: torch.Size([7956, 5]) X_train type: <class 'torch.Tensor'>
 tensor([[-0.0103, -0.0091, -0.0107, -0.0114,  0.1141],
        [-0.0113, -0.0116, -0.0109, -0.0117, -0.5357],
        [-0.0116, -0.0115, -0.0108, -0.0110, -0.6896],
        ...,
        [-0.0780, -0.0754, -0.0748, -0.0734, -0.2527],
        [-0.0734, -0.0760, -0.0737, -0.0763, -0.6506],
        [-0.0763, -0.0767, -0.0739, -0.0763, -0.3977]], device='cuda:0')
y_train.shape: torch.Size([7956]) y_train type: <class 'torch.Tensor'>
 tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0')
X_val.shape: torch.Size([979, 5]) X_val type: <class 'torch.Tensor'>
 tensor([[-0.0763, -0.0761, -0.0738, -0.0761, -0.5517],
        [-0.0760, -0.0784, -0.0737, -0.0767, -0.7224],
        [-0.0767, -0.0780, -0.0767, -0.0791, -0.4797],
        ...,
        [-0.0186, -0.0175, -0.0194, -0.0182,  3.2428],
        [-0.0183, -0.0203, -0.0228, -0.0219,  2.7683],
        [-0.0219, -0.0187, -0.0194, -0.0167,  1.5095]], device='cuda:0')
y_val.shape: (979,) y_val type: <class 'numpy.ndarray'>
 [ 1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  0.00  1.00  1.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  1.00  0.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  1.00  1.00
  0.00  0.00  0.00  0.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  0.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.00
  0.00  0.00  0.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.00  0.00
  0.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  1.00  0.00  0.00  1.00  1.00  1.00  1.00  1.00
  1.00  1.00  1.00  1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
  0.00  0.00  0.00  0.00  0.00  1.00  1.00  1.00  1.00  1.00  0.00  0.00
  1.00  0.00  1.00  1.00  1.00  1.00  1.00]
X_test.shape: torch.Size([2661, 5]) X_test type: <class 'torch.Tensor'>
 tensor([[-0.0168, -0.0148, -0.0143, -0.0127,  0.9034],
        [-0.0128, -0.0124, -0.0111, -0.0121,  0.2539],
        [-0.0121, -0.0102, -0.0112, -0.0080,  0.7918],
        ...,
        [ 0.0135,  0.0165,  0.0155,  0.0151,  0.0248],
        [ 0.0151,  0.0152,  0.0175,  0.0159, -0.5096],
        [ 0.0159,  0.0181,  0.0175,  0.0192,  0.1800]], device='cuda:0')
y_test.shape: (2661,) y_test type: <class 'numpy.ndarray'>
 [ 1.00  1.00  1.00 ...  0.00  0.00  0.00]
model type: <class '__main__.LSTMClassifier'>
 LSTMClassifier(
  (lstm): LSTM(5, 16, batch_first=True)
  (dense): Linear(in_features=16, out_features=1, bias=True)
)
loss_function type: <class 'torch.nn.modules.loss.BCELoss'>
 BCELoss()
optimizer type: <class 'torch.optim.adam.Adam'>
 Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0
)
train_size: 7956
EPOCH: 0 loss: 0.691195547580719 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5107399425287357
best score updated, Pytorch model was saved!!
EPOCH: 1 loss: 0.6914775967597961 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5501719006568144
EPOCH: 2 loss: 0.6868444085121155 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5867456896551724
EPOCH: 3 loss: 0.6009563207626343 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6160483374384237
EPOCH: 4 loss: 0.6663740277290344 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6489326765188834
EPOCH: 5 loss: 0.6851885318756104 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.682473830049261
EPOCH: 6 loss: 0.6095741391181946 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.703132697044335
EPOCH: 7 loss: 0.7387176156044006 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.7015419745484401
EPOCH: 8 loss: 0.690645694732666 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6566810344827586
EPOCH: 9 loss: 0.6829000115394592 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.622172619047619
EPOCH: 10 loss: 0.738982617855072 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5912587233169129
EPOCH: 11 loss: 0.6995638608932495 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5838156814449917
EPOCH: 12 loss: 0.7947514057159424 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5787972085385878
EPOCH: 13 loss: 0.6817842125892639 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5798465722495894
EPOCH: 14 loss: 0.6924517154693604 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5760827175697866
EPOCH: 15 loss: 0.5952319502830505 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5766369047619048
EPOCH: 16 loss: 0.6980494260787964 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5808010057471265
EPOCH: 17 loss: 0.6621570587158203 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5822839696223316
EPOCH: 18 loss: 0.6212792992591858 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.586889367816092
EPOCH: 19 loss: 0.6528978943824768 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5857194170771757
EPOCH: 20 loss: 0.7154173254966736 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5881773399014778
EPOCH: 21 loss: 0.7460910677909851 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.5952534893267653
EPOCH: 22 loss: 0.6252413988113403 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6035714285714285
EPOCH: 23 loss: 0.6823109984397888 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6014983579638752
EPOCH: 24 loss: 0.6286243200302124 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6065219622331691
EPOCH: 25 loss: 0.6132022142410278 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6133697660098522
EPOCH: 26 loss: 0.5485300421714783 Val ACC Score: 0.6555435952637244 ROC AUC Score: 0.6092980295566501
EPOCH: 27 loss: 0.5204343795776367 Val ACC Score: 0.6609257265877287 ROC AUC Score: 0.6157327586206897
best score updated, Pytorch model was saved!!
EPOCH: 28 loss: 0.6037634611129761 Val ACC Score: 0.6749192680301399 ROC AUC Score: 0.6112941297208538
best score updated, Pytorch model was saved!!
EPOCH: 29 loss: 0.6126266717910767 Val ACC Score: 0.7158234660925726 ROC AUC Score: 0.6105398193760263
best score updated, Pytorch model was saved!!
Test ACC Score: 0.5947912677135198 ROC AUC Score: 0.45556905274916193
df_ohlcv.shape: (12240, 5) df_ohlcv type: <class 'pandas.core.frame.DataFrame'>
                         Open     High      Low    Close      Volume
timestamp
2020-09-30 00:00:00  10839.5  10842.0  10827.5  10828.0  13500945.0
2020-09-30 00:10:00  10828.5  10829.0  10822.0  10829.0   4477779.0
2020-09-30 00:20:00  10829.0  10829.0  10816.5  10819.5   3589041.0
2020-09-30 00:30:00  10819.5  10820.0  10814.5  10814.5   4523661.0
2020-09-30 00:40:00  10815.0  10820.0  10812.0  10820.0   3463389.0
...                      ...      ...      ...      ...         ...
2020-12-23 23:10:00  23307.0  23400.0  23264.0  23315.5  14089639.0
2020-12-23 23:20:00  23315.5  23485.5  23315.0  23382.0  40956253.0
2020-12-23 23:30:00  23382.0  23420.0  23333.0  23365.0  15243013.0
2020-12-23 23:40:00  23365.5  23376.0  23264.0  23281.0  13256178.0
2020-12-23 23:50:00  23281.5  23288.0  23190.0  23241.0  16298938.0

[12240 rows x 5 columns]
Start                     2020-11-24 13:20:00
End                       2020-12-23 23:50:00
Duration                     29 days 10:30:00
Exposure Time [%]                     10.2358
Equity Final [$]                       110980
Equity Peak [$]                        111948
Return [%]                            10.9804
Buy & Hold Return [%]                 20.5915
Return (Ann.) [%]                     255.219
Volatility (Ann.) [%]                 52.4708
Sharpe Ratio                          4.86403
Sortino Ratio                         30.6017
Calmar Ratio                          127.727
Max. Drawdown [%]                    -1.99816
Avg. Drawdown [%]                   -0.526232
Max. Drawdown Duration        7 days 22:30:00
Avg. Drawdown Duration        0 days 16:50:00
# Trades                                   26
Win Rate [%]                          73.0769
Best Trade [%]                        1.00127
Worst Trade [%]                      -1.00668
Avg. Trade [%]                       0.453453
Max. Trade Duration           0 days 10:40:00
Avg. Trade Duration           0 days 02:37:00
Profit Factor                         2.69042
Expectancy [%]                       0.457399
SQN                                   2.53193
_strategy                    myCustomStrategy
_equity_curve                             ...
_trades                       Size  EntryB...
dtype: object
$

References

-Time Series Forecast -Understanding LSTM Networks -What is teacher coercion for recurrent neural networks -Scheduling for array prediction using recurrent neural network

[PYTHON] I tried to predict the BTC Trade data obtained from BitMEX with LSTM using PyTorch, but the vanishing gradient problem? Frustrated by the NaN problem, etc. I regained my mind and tried backtesting with a separate trading strategy!

title

background

Why PyTorch? Why BTC?

Conclusion, remaining items

flow

BitMEX data acquisition

Data preprocessing

VWAP calculation

train/val/test data creation

scaling

Time bar data conversion

Model building and learning

Model evaluation

Test set

Zoom in

Inference by model

Teacher compulsion

Model evaluation

Inference by model

Consideration

Conclusion

Build another LSTM model

Backtest

Score

plot

Required modules (install the following modules with pip)

Source code 1 (PyTorch's LSTM, code that challenged data scaling and teacher coercion)

btc_prediction_by_lstm_pytorch.py

Execution result log 1 (log when source code 1 reads 2019 BTC data)

Execution result log 2 (Log that causes Source Code 1 to read BTC data for 2020 and Loss becomes NaN)

Source code 2 (PyTorch's LSTM simple modeling, backtesting code)

btc_prediction_and_backtest_by_pytorch.py

Execution result log 3 (Log in which source code 2 reads BTC data for 2020)

References

`btc_prediction_by_lstm_pytorch.py`

`btc_prediction_and_backtest_by_pytorch.py`