[PYTHON] Studying Recurrent Neural Networks: Can Periodic Functions Be Reproduced?

In deep learning, I tried to see how much the periodic function can be reproduced in order to study the recurrent neural network used for analysis of time series data.

Definition of three types of periodic functions

Here, we have defined the following three types of periodic functions: f`` g h.

import numpy as np

def f(t, freq=25):
    return np.sin(2. * np.pi * t / freq)

def g(t, freq=25, amp=10, threshold = 10):
    return 1/(1 + np.exp(10 * np.sin(2 * np.pi * t / freq) + 10))

def h(t, freqs=[11, 23, 31, 41, 53, 61, 71, 83, 97]):
    value = np.zeros_like(t)
    for freq in freqs:
        value += f(t, freq)
    return value

Function f

The function f is the trigonometric function $ sin $. Here, the period is set to $ 25 $.

%matplotlib inline
import matplotlib.pyplot as plt

total_time_length = 1000
times = np.linspace(0, total_time_length, total_time_length + 1)

plt.figure(figsize=(15, 6))
plt.plot(f(times))
plt.xticks(np.linspace(0, 1000, 11))
plt.grid()

output_2_0.png

Function g

The function g transforms the trigonometric function $ sin $ with a sigmoid function to make it weird. Again, I set the cycle to $ 25 $.

%matplotlib inline
import matplotlib.pyplot as plt

total_time_length = 1000
times = np.linspace(0, total_time_length, total_time_length + 1)

plt.figure(figsize=(15, 6))
plt.plot(g(times))
plt.xticks(np.linspace(0, 1000, 11))
plt.grid()

output_3_0.png

Function h

The function h is the sum of trigonometric functions $ sin $ with some prime cycles.

%matplotlib inline
import matplotlib.pyplot as plt

total_time_length = 1000
times = np.linspace(0, total_time_length, total_time_length + 1)

plt.figure(figsize=(15, 6))
plt.plot(h(times))
plt.xticks(np.linspace(0, 1000, 11))
plt.grid()

output_4_0.png

Can these functions be reproduced in a recursive neural network?

Fourier transform

Before using a recurrent neural network, let's convert the above three functions into frequencies by Fourier transform and take the reciprocal of them to check the "period".

plt.figure(figsize=(6,6))

sp = np.fft.fft(f(times))
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(311)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="f")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

sp = np.fft.fft(g(times))
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(312)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="g")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

sp = np.fft.fft(h(times))
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(313)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="h")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

output_6_1.png

The function f consists only of sine waves with a period of $ 25 $. You can see that the function h consists of a combination of periods of the specified prime numbers. On the other hand, you can see that the function g contains various components other than the specified period of $ 25 $.

Creating a time series dataset

Now let's create a time series dataset to train a recurrent neural network. In a recursive neural network, inputting the output $ X_ {t, t + w} $ from time $ t $ to $ t + w $ predicts the output $ X_ {t + w + 1} $ at the next time. Aim to create a model to do.

import numpy as np
from sklearn.model_selection import train_test_split

func = f #You can change the function by rewriting here
#func = g
#func = h

total_time_length = 10000 #All time widths to handle
pred_length = 1000 #Predicted time width
learning_time_length = 100 #Time width used for learning

time_series_T = np.linspace(0, total_time_length, total_time_length + 1) #Engrave the predicted time T (corresponding to the horizontal axis of the graph)
time_series_X = func(time_series_T) #Function output X (corresponds to the vertical axis of the graph)

X_learn = [] #Time t ~ t+learning_time_Stores X up to length
Y_learn = [] #Time t+learning_time_length+Store 1 X
for i in range(total_time_length - learning_time_length):
    X_learn.append(time_series_X[i:i+learning_time_length].reshape(1, learning_time_length).T)
    Y_learn.append([time_series_X[i+learning_time_length]])

#Divided into training data and verification data
#Shuffle for time series data=Must be False
X_train, X_val, Y_train, Y_val = \
train_test_split(X_learn, Y_learn, test_size=0.2, shuffle=False)

# scikit-Convert to data type for learn
X_train2sklearn = [list(x.reshape(1, len(x))[0]) for x in X_train]
Y_train2sklearn = [y[0] for y in Y_train]

Deep learning method

MLP (Multi-Layer Perceptron)

For comparison, we use the simplest model of deep learning, the Multi-Layer Perceptron (MLP). I used the multi-layer perceptron because scikit-learn is easy and fast.

%%time
from sklearn.neural_network import MLPRegressor
regressor = MLPRegressor(hidden_layer_sizes=(100, 100, 100), 
                         early_stopping=True, max_iter=10000) 
regressor.fit(X_train2sklearn, Y_train2sklearn) 
CPU times: user 2.84 s, sys: 1.41 s, total: 4.25 s
Wall time: 2.22 s

The learning curve was drawn as follows.

plt.plot(regressor.loss_curve_)
%matplotlib inline
import matplotlib.pyplot as plt
plt.subplot(211)
plt.plot(regressor.loss_curve_, label='train_loss')
plt.legend()
plt.grid()
plt.subplot(212)
plt.plot(regressor.loss_curve_, label='train_loss')
plt.yscale('log')
plt.legend()
plt.grid()

output_11_0.png

Give the trained model the first few hours (only the length of pred_length) as input and let it predict the output at the next time. Add the output predicted value to the input to predict the output at the next time. Repeat it endlessly.

%%time
pred_length = 1000
X_pred_length = np.linspace(0, pred_length , pred_length + 1)
Y_observed = func(X_pred_length)
Y_pred = Y_observed[:learning_time_length+1]

for i in range(pred_length):
    X_ = [Y_pred[i:i+learning_time_length]]
    Y_ = regressor.predict(X_)
    Y_pred = np.append(Y_pred, Y_)
CPU times: user 383 ms, sys: 279 ms, total: 662 ms
Wall time: 351 ms

The curve of the predicted value obtained in this way and the actual curve are illustrated and compared.

plt.figure(figsize=(36, 6))
times = np.linspace(0, Y_pred.shape[0] - 1, Y_pred.shape[0])
plt.plot(func(times), label="time series")
plt.plot(Y_pred, alpha=0.5, label="predicted")
plt.xticks(np.linspace(0, 1000, 11))
plt.xlim([0, 1000])
plt.grid()
plt.legend()

output_13_1.png

If the figure is small and difficult to see, you can enlarge it by clicking on it. Up to the time $ 100 $, the training data is used as it is, so it matches, but after the time $ 100 $, you can see that the actual value (value of the function f) and the predicted value are gradually shifting. ..

Let's compare the results of the Fourier transform.

plt.figure(figsize=(6,4))

sp = np.fft.fft(func(times))
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(211)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="observed")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

sp = np.fft.fft(Y_pred)
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(212)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="predicted")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

output_14_1.png

The predicted period is close to the actual period, but appears to be slightly off.

Recurrent neural network

Now let's build a recurrent neural network. Here, PyTorch, a library for deep learning, is used.

First, set up the device. If you can use the GPU, use the GPU, and if not, use the CPU as follows.

import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Prepare a function for EarlyStopping. Early termination is the process of terminating learning when it is determined that learning will not proceed any further.

In deep learning, the prediction error is called loss. In order to prevent overfitting (also called overfitting), learning is stopped when it is judged that the loss of the verification data that was not used for training does not decrease any more, not the loss of the training data. .. As a judgment, we introduce the concept of patience. If patience = 20, compare the minimum value of loss of $ 20 $ recently with the minimum value of loss before that, and if the former is larger, judge that" it will not be any better "and learn. Censor.

def EarlyStopping(log, patience=20):
    if len(log) <= patience:
        return False
    min1 = log[:len(log)-patience].min()
    min2 = log[len(log)-patience:].min()
    if min1 <= min2:
        return True
    else:
        return False

RNN(Recurrent Neural Network)

It's a bit confusing, but RNNs (Recurrent Neural Networks) have "RNNs in a broad sense" and "RNNs in a narrow sense". RNNs in the narrow sense can be implemented in PyTorch as follows:

import torch
class RNN(torch.nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.l1 = torch.nn.RNN(1, hidden_dim,
                         nonlinearity='tanh',
                         batch_first=True)
        self.l2 = torch.nn.Linear(hidden_dim, 1)
        torch.nn.init.xavier_normal_(self.l1.weight_ih_l0)
        torch.nn.init.orthogonal_(self.l1.weight_hh_l0)

    def forward(self, x):
        h, _ = self.l1(x)
        y = self.l2(h[:, -1])
        return y

LSTM (Long Short-Term Memory)

LSTM (Long Short-Term Memory) is either long or short! !! It's a name that makes you want to dig in, but it's a kind of "RNN in a broad sense". It is said that long-term memory is superior to "RNN in a narrow sense". You can implement it in PyTorch as follows:

import torch
class LSTM(torch.nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.l1 = torch.nn.LSTM(1, hidden_dim, batch_first=True)
        self.l2 = torch.nn.Linear(hidden_dim, 1)
        torch.nn.init.xavier_normal_(self.l1.weight_ih_l0)
        torch.nn.init.orthogonal_(self.l1.weight_hh_l0)

    def forward(self, x):
        h, _ = self.l1(x)
        y = self.l2(h[:, -1])
        return y

GRU (Gated Recurrent Unit)

The GRU is a Gated Recurrent Unit, not the GRUnoye Razvedyvatelnoye Upravleniye. It is said that the calculation time is short while having the same or better performance as LSTM.

import torch
class GRU(torch.nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.l1 = torch.nn.GRU(1, hidden_dim, batch_first=True)
        self.l2 = torch.nn.Linear(hidden_dim, 1)
        torch.nn.init.xavier_normal_(self.l1.weight_ih_l0)
        torch.nn.init.orthogonal_(self.l1.weight_hh_l0)

    def forward(self, x):
        h, _ = self.l1(x)
        y = self.l2(h[:, -1])
        return y

Learning execution by RNN

I learned as follows. The following code is for RNN, but you can change it to LSTM or GRU by rewriting it in one place.

%%time
from sklearn.utils import shuffle
model = RNN(50).to(device) #You can change the network by rewriting here
#model = LSTM(50).to(device)
#model = GRU(50).to(device)
criterion = torch.nn.MSELoss(reduction='mean')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, 
                            betas=(0.9, 0.999), amsgrad=True)

epochs = 1000
batch_size = 100
n_batches_train = len(X_train) // batch_size - 1
n_batches_test = len(X_val) // batch_size - 1
hist = {'train_loss':[], 'val_loss':[]}

for epoch in range(epochs):
    train_loss = 0.
    val_loss = 0.
    X_, Y_ = shuffle(X_train, Y_train)

    for batch in range(n_batches_train):
        start = batch * batch_size
        end = start + batch_size
        X = torch.Tensor(X_[start:end])
        Y = torch.Tensor(Y_[start:end])
        model.train()
        Y_pred = model(X)
        loss = criterion(Y, Y_pred)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    for batch in range(n_batches_test):
        start = batch * batch_size
        end = start + batch_size
        X = torch.Tensor(X_val[start:end])
        Y = torch.Tensor(Y_val[start:end])
        model.eval()
        Y_pred = model(X)
        loss = criterion(Y, Y_pred)
        val_loss += loss.item()
    
    train_loss /= n_batches_train
    val_loss /= n_batches_test
    hist['train_loss'].append(train_loss)
    hist['val_loss'].append(val_loss)
    print("Epoch:", epoch + 1, "Train loss:", train_loss, "Val loss:", val_loss)

    if EarlyStopping(np.array(hist['val_loss'])):
        print("Early stopping at epoch", epoch + 1)
        break
Epoch: 1 Train loss: 0.024917872337913975 Val loss: 6.828008190495893e-05
Epoch: 2 Train loss: 3.570798083110742e-05 Val loss: 3.464157634880394e-05
Epoch: 3 Train loss: 2.720728588638639e-05 Val loss: 1.954806430148892e-05
...
Epoch: 580 Train loss: 5.4337909078255436e-08 Val loss: 4.0113718569045886e-08
Epoch: 581 Train loss: 6.47745306281422e-08 Val loss: 5.6099906942108646e-08
Epoch: 582 Train loss: 5.797503896896836e-08 Val loss: 1.620698952820021e-07
Early stopping at epoch 582
CPU times: user 26min 9s, sys: 21.6 s, total: 26min 31s
Wall time: 26min 39s

Learning is over. Outputs the learning curve.

%matplotlib inline
import matplotlib.pyplot as plt
plt.subplot(211)
plt.plot(hist['train_loss'], label='train_loss')
plt.plot(hist['val_loss'], label='val_loss')
plt.legend()
plt.grid()
plt.subplot(212)
plt.plot(hist['train_loss'], label='train_loss')
plt.plot(hist['val_loss'], label='val_loss')
plt.yscale('log')
plt.legend()
plt.grid()

output_27_0.png

Give the trained model the first few hours (only the length of pred_length) as input and let it predict the output at the next time. Add the output predicted value to the input to predict the output at the next time. Repeat it endlessly.

%%time
total_time_length = 10000
pred_length = 1000
learning_time_length = 100

X_pred_length = np.linspace(0, pred_length , pred_length + 1)
Y_observed = func(X_pred_length)
Y_pred = Y_observed[:learning_time_length+1]

for i in range(pred_length):
    X_ = Y_pred[i:i+learning_time_length+1].reshape(1, learning_time_length + 1, 1)
    Y_ = model(torch.Tensor(X_)).detach().numpy()
    Y_pred = np.append(Y_pred, Y_)
CPU times: user 2.54 s, sys: 5.97 ms, total: 2.55 s
Wall time: 2.55 s

The curve of the predicted value obtained in this way and the actual curve are illustrated and compared.

plt.figure(figsize=(36, 6))
times = np.linspace(0, Y_pred.shape[0] - 1, Y_pred.shape[0])
plt.plot(func(times), label="time series")
plt.plot(Y_pred, alpha=0.5, label="predicted")
plt.xticks(np.linspace(0, 1000, 11))
plt.xlim([0, 1000])
plt.grid()
plt.legend()

output_29_1.png

If the figure is small and difficult to see, you can enlarge it by clicking on it. Up to time $ 100 $, the training data is used as it is, so it is natural that they match, but after time $ 100 $, you can see that they match perfectly.

Let's compare the results of the Fourier transform.

plt.figure(figsize=(6,4))

sp = np.fft.fft(func(times))
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(211)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="observed")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

sp = np.fft.fft(Y_pred)
freq = np.fft.fftfreq(times.shape[-1])

plt.subplot(212)
plt.plot(1/freq, abs(sp.real) + abs(sp.imag), label="predicted")
plt.plot(1/freq, abs(sp.real))
plt.plot(1/freq, abs(sp.imag), alpha=0.5)
plt.legend()
plt.xlim([0, 150])
plt.xticks(np.linspace(0, 150, 16))
plt.grid()

output_30_1.png

It seems that the cycles are exactly the same.

Learning execution by LSTM

As mentioned above, learning an LSTM simply replaces model = RNN (50) .to (device) in the code above with model = LSTM (50) .to (device).

Epoch: 1 Train loss: 0.24947839844315192 Val loss: 0.0037629783619195223
Epoch: 2 Train loss: 0.0010665786028720248 Val loss: 0.0004544752591755241
Epoch: 3 Train loss: 0.000281030429528656 Val loss: 0.00014765093510504812
...
Epoch: 397 Train loss: 1.9865108783006072e-08 Val loss: 1.99065262052045e-08
Epoch: 398 Train loss: 1.840841412067617e-08 Val loss: 1.814414751777349e-08
Epoch: 399 Train loss: 1.7767042196444784e-08 Val loss: 1.9604467382805524e-08
Early stopping at epoch 399
CPU times: user 48min 40s, sys: 51.2 s, total: 49min 31s
Wall time: 49min 41s

output_33_0.png

CPU times: user 7.67 s, sys: 14 ms, total: 7.68 s
Wall time: 7.69 s

output_35_1.png

output_36_1.png

It seems that this was also a perfect prediction.

Learning execution by GRU

As mentioned above, learning a GRU simply replaces model = RNN (50) .to (device) in the code above with model = GRU (50) .to (device).

Epoch: 1 Train loss: 0.2067998453276232 Val loss: 0.0007729934877716005
Epoch: 2 Train loss: 0.0005770771786979495 Val loss: 0.00023205751494970173
Epoch: 3 Train loss: 0.00018625847849015816 Val loss: 0.00014329736586660147
...
Epoch: 315 Train loss: 5.816128262764026e-09 Val loss: 5.750611098420677e-09
Epoch: 316 Train loss: 5.757192062114896e-09 Val loss: 5.7092033323158375e-09
Epoch: 317 Train loss: 5.780735246610847e-09 Val loss: 5.6715170337895415e-09
Early stopping at epoch 317
CPU times: user 34min 51s, sys: 42.1 s, total: 35min 33s
Wall time: 35min 40s

output_39_0.png

CPU times: user 8.81 s, sys: 7.04 ms, total: 8.81 s
Wall time: 8.82 s

output_41_1.png

output_42_1.png

It seems that this was also a perfect prediction.

Experiments to change the shape and period of functions

Now that I know how to move it, let's finally experiment with changing the shape and period of the function. The result is as follows.

Increase the period of function f

MLP, function f

MLP, function f, period = 25

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function f, period = 50

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function f, period = 100

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function f Summary

Even if the cycle becomes large, there is no big change in the number of epochs spent until the early end. The shape of the curve did not collapse significantly, but there was a shift in the cycle. The height (amplitude) of the output was preserved when the cycle was short, but it was found that it tended to become shorter as the cycle became longer.

RNN, function f

RNN, function f, period = 25

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function f, period = 50

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function f, period = 100

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function f Summary

The number of epochs spent to end early tended to decrease as the cycle increased. The prediction curve made good predictions in the short period (25) or medium period (50), but lost its shape significantly in the long period (100). In the long-period prediction, a sharp peak was observed at a strange place.

LSTM, function f

LSTM, function f, period = 25

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function f, period = 50

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function f, period = 100

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function f Summary

The number of epochs spent before early termination may (albeit unclear) decrease with longer cycles. Good predictions were made in all of the short period (25), medium period (50), and long period (100).

GRU, function f

GRU, function f, period = 25

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function f, period = 50

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function f, period = 100

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function f

I've heard that GRUs are faster than LSTMs, but the number of epochs spent before early termination was rather long (reaching 1000 epochs). Good predictions were made in all of the short period (25), medium period (50), and long period (100).

Increase the period of function g

MLP, function g

MLP, function g, period = 25

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function g, period = 50

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function g, period = 100

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

MLP, function g Summary

The number of epochs spent until the early end is almost unchanged even if the cycle changes. The approximate shape is preserved, but there is a tendency for the cycle to shift when the feet are squishy and when the peak height is low. Looking at the Fourier transform diagram, it seems that many peaks (periods) are picked up (although they are out of alignment).

RNN, function g

RNN, function g, period = 25

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function g, period = 50

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function g, period = 100

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

RNN, function g Summary

The one with a short cycle (25) is a good prediction. The one with the middle cycle (50) picks up the main peaks, but they are out of alignment. In the long period (100), RNN seems to have given up the prediction early. Of the many cycles, it seems that only relatively short cycles are picked up.

LSTM, function g

LSTM, function g, period = 25

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function g, period = 50

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function g, period = 100

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

LSTM, function g Summary

Surprisingly, it didn't work in every cycle. The short cycle worked well with RNNs, but not with LSTMs, picking up an unlikely peak. The result of the middle cycle is the best, but the cycle was still off. I can hear the voice saying that I have given up on the prediction for the long period.

GRU, function g

GRU, function g, period = 25

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function g, period = 50

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function g, period = 100

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

GRU, function g Summary

This is also surprising. LSTMs were worse than RNNs, but GRUs were worse. The short cycle is a little better, but the middle cycle feels like I've given up.

Function h with various periods

At first, I was thinking of mixing more diverse cycles, but after seeing the above results, I thought it would be better to mix a little less.

MLP, function h

Learning curve

output_11_0.png

Prediction curve

output_13_1.png

Fourier transform

output_14_1.png

Picking up the cycle seems to be relatively successful. However, since the amplitude is not picked up well, the result is a prediction with a large error.

RNN, function h

Learning curve

output_27_0.png

Prediction curve

output_29_1.png

Fourier transform

output_30_1.png

I usually pick up the cycle, but something is out of sync. The amplitude has increased for some reason.

LSTM, function h

Learning curve

output_33_0.png

Prediction curve

output_35_1.png

Fourier transform

output_36_1.png

I've picked up some cycles, but I've picked up quite a few cycles that aren't there. In that sense, is RNN a little better?

GRU, function h

Learning curve

output_39_0.png

Prediction curve

output_41_1.png

Fourier transform

output_42_1.png

It feels even worse than the LSTM.

Summary

This is the summary. As far as I've dealt with some periodic functions

I haven't studied enough yet, so there may be some strange parts, but I would appreciate it if you could point out any points you noticed ... (ヽ ´ω`)

Recommended Posts

Studying Recurrent Neural Networks: Can Periodic Functions Be Reproduced?
Recurrent Neural Networks: An Introduction to RNN
Functions that can be used in for statements