This article is like a continuation of Implementing Keras LSTM feedforward with numpy. Implement LSTM AutoEncoder with Keras and try binary classification from the obtained features. The data generates two sine waves with different frequencies and identifies them.
The following article is used as a reference for data generation. Thank you. -Sine wave prediction using RNN in deep learning library Keras -I made RNN learn the sine wave and predicted it
This time, the data generation method has been slightly changed for binary classification. Prepare two sine waves with different frequencies and let them be identified.
This is the code to generate the data used this time.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def make_data(random_factor, number_of_cycles, \
timesteps, sampling_num_pair):
"""
sampling_num_pair : 2 elements of tupple
Number to sample during one cycle
Ex. (20,80)
"""
def _load_data(data, n_prev = 100):
"""
data should be pd.DataFrame()
"""
docX, docY = [], []
for i in range(len(data)-n_prev):
docX.append(data.iloc[i:i+n_prev].as_matrix())
alsX = np.array(docX)
return alsX
def _train_test_split(df, test_size=0.1, n_prev = 100):
"""
This just splits data to training and testing parts
"""
ntrn = round(len(df) * (1 - test_size))
ntrn = int(ntrn)
X_train = _load_data(df.iloc[0:ntrn], n_prev)
X_test = _load_data(df.iloc[ntrn:], n_prev)
return X_train, X_test
np.random.seed(0)
sampling_num1, sampling_num2 = sampling_num_pair
df1 = pd.DataFrame(np.arange(sampling_num1 * number_of_cycles + 1), columns=["t"])
df1["sin_t"] = df1.t.apply(lambda x: np.sin(x * (2 * np.pi / sampling_num1)+ np.random.uniform(-1.0, +1.0) * random_factor))
df2 = pd.DataFrame(np.arange(sampling_num2 * number_of_cycles + 1), columns=["t"])
df2["sin_t2"] = df2.t.apply(lambda x: np.sin(x * (2 * np.pi / sampling_num2)+ np.random.uniform(-1.0, +1.0) * random_factor))
X_train1, X_test1 = _train_test_split(df1[["sin_t"]], n_prev=timesteps)
X_train2, X_test2 = _train_test_split(df2[["sin_t2"]], n_prev=timesteps)
# concatenate X and make y
X_train = np.r_[X_train1, X_train2]
y_train = np.r_[np.tile(0, X_train1.shape[0]), np.tile(1, X_train2.shape[0])]
X_test = np.r_[X_test1, X_test2]
y_test = np.r_[np.tile(0, X_test1.shape[0]), np.tile(1, X_test2.shape[0])]
return X_train, y_train, X_test, y_test
#Random number coefficient
random_factor = 0.05
#Number of cycles to generate
number_of_cycles = 200
#How many points should be sampled in one cycle?
sampling_num_pair=(20,80)
#The length of the window. It is the length of one series.
timesteps = 100
X_train, y_train, X_test, y_test = make_data(random_factor, number_of_cycles, timesteps, sampling_num_pair)
print("X_train.shape : ", X_train.shape) #X_train.shape : (17802, 100, 1)
print("y_train.shape : ", y_train.shape) #y_train.shape : (17802,)
print("X_test.shape : ", X_test.shape) #X_test.shape : (1800, 100, 1)
print("y_test.shape : ", y_test.shape) #y_test.shape : (1800,)
Let's plot the generated sine wave.
# make_data function df1,Extract df2 and plot it.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(df1["sin_t"][0:80], label="class0 sampling_num 20", color="red")
ax.plot(df2["sin_t2"][0:80], label="class1 sampling_num 80", color="blue")
ax.legend(loc="upper right")
fig.savefig("./sin_plot.png ")
plt.close("all")
You can see that class0 has four times the frequency of class1.
I referred to The Keras Blog to build the model. There was something like a hint in the chapter of Sequence-to-sequence autoencoder, so I added it as appropriate.
The original paper is Unsupervised Learning of Video Representations using LSTMs. This paper proposes conditional (passing the input in reverse order at the time of decoding) and unconditinal (passing nothing), but this time we implement the conditional.
Define the model of LSTM AutoEncoder. Then train and reconstruct the original X_train.
from keras.layers import Input, LSTM, RepeatVector, concatenate, Dense
from keras.models import Model
input_dim = 1
latent_dim = 10
# encode
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim, activation="tanh", recurrent_activation="sigmoid", return_sequences=False)(inputs)
#decode
hidden = RepeatVector(timesteps)(encoded)
reverse_input = Input(shape=(timesteps, input_dim))
hidden_revinput = concatenate([hidden, reverse_input])
decoded = LSTM(latent_dim, activation="tanh", recurrent_activation="sigmoid", return_sequences=True)(hidden_revinput)
decoded = Dense(latent_dim, activation="relu")(decoded)
decoded = Dense(input_dim, activation="tanh")(decoded)
# train
LSTM_AE = Model([inputs, reverse_input], decoded)
LSTM_AE.compile(optimizer='rmsprop', loss='mse')
X_train_rev = X_train[:,::-1,:]
LSTM_AE.fit([X_train, X_train_rev], X_train, epochs=30, batch_size=500, shuffle=True, validation_data=([X_train, X_train_rev], X_train))
X_hat = LSTM_AE.predict([X_train, X_train_rev])
This code adds a full connect layer when decoding, but it doesn't mean anything in particular. Keras is amazing! You can add layers so easily! It is the result of playing with.
Next, class0 and class1 are divided into classes to see how they are reconfigured by AutoEncoder.
def split_X(X, y):
y_inv = np.abs(y - 1.)
X_0 = X[y_inv.astype(bool),:,:]
X_1 = X[y.astype(bool),:,:]
return X_0, X_1
X_train_0, X_train_1 = split_X(X_train, y_train)
X_hat_0, X_hat_1 = split_X(X_hat, y_train)
print("X_train_0.shape : ", X_train_0.shape) #X_train_0.shape : (3501, 100, 1)
print("X_train_1.shape : ", X_train_1.shape) #X_train_1.shape : (14301, 100, 1)
print("X_hat_0.shape : ", X_hat_0.shape) #X_hat_0.shape : (3501, 100, 1)
print("X_hat_1.shape : ", X_hat_1.shape) #X_hat_1.shape : (14301, 100, 1)
Plot and check what was reconstructed by AutoEncoder.
#Reconstructed X_Let's see what the train looks like
def plot_save(start_index, X_hat, X_train, X_class):
fig = plt.figure()
ax = fig.add_subplot(111)
for i in np.arange(start_index,start_index+5):
#Plot 5 pieces at a time.
ax.plot(X_hat[i,:,0], label="X hat", color="red")
ax.plot(X_train[i,:,0], label="X train", color="blue")
savename = "./AE_reconst_30ep_start_" + str(start_index) + "_cls" + str(X_class) + ".png "
fig.savefig(savename)
plt.close("all")
start_list = np.arange(0,len(X_train_0), 1000)
for start_index in start_list:
plot_save(start_index, X_hat_0, X_train_0, X_class=0)
start_list = np.arange(0,len(X_train_1), 1000)
for start_index in start_list:
plot_save(start_index, X_hat_1, X_train_1, X_class=1)
class0
class1
Blue is the original data and red is the reconstructed data. These are the ones I chose from a number of plots. Everything was like this.
By the way, it seems that it has not converged yet, so I think that it will be a better approximation if you increase epoch, but I decided that this is enough in terms of machine specifications. By the way, it took about 15 minutes at 30 epoch. (I'm likely to be told to do my best)
This is the same as Previous article. Extract the features obtained by the encoder from the model.
Define the function required to extract the features.
def split_params(W, U, b):
Wi = W[:,0:latent_dim]
Wf = W[:,latent_dim:2*latent_dim]
Wc = W[:,2*latent_dim:3*latent_dim]
Wo = W[:,3*latent_dim:]
print("Wi : ",Wi.shape)
print("Wf : ",Wf.shape)
print("Wc : ",Wc.shape)
print("Wo : ",Wo.shape)
Ui = U[:,0:latent_dim]
Uf = U[:,latent_dim:2*latent_dim]
Uc = U[:,2*latent_dim:3*latent_dim]
Uo = U[:,3*latent_dim:]
print("Ui : ",Ui.shape)
print("Uf : ",Uf.shape)
print("Uc : ",Uc.shape)
print("Uo : ",Uo.shape)
bi = b[0:latent_dim]
bf = b[latent_dim:2*latent_dim]
bc = b[2*latent_dim:3*latent_dim]
bo = b[3*latent_dim:]
print("bi : ",bi.shape)
print("bf : ",bf.shape)
print("bc : ",bc.shape)
print("bo : ",bo.shape)
return (Wi, Wf, Wc, Wo), (Ui, Uf, Uc, Uo), (bi, bf, bc, bo)
def calc_ht(params):
x, latent_dim, W_, U_, b_ = params
Wi, Wf, Wc, Wo = W_
Ui, Uf, Uc, Uo = U_
bi, bf, bc, bo = b_
n = x.shape[0]
ht_1 = np.zeros(n*latent_dim).reshape(n,latent_dim) #h_{t-1}Means.
Ct_1 = np.zeros(n*latent_dim).reshape(n,latent_dim) #C_{t-1}Means.
ht_list = []
for t in np.arange(x.shape[1]):
xt = np.array(x[:,t,:])
it = sigmoid(np.dot(xt, Wi) + np.dot(ht_1, Ui) + bi)
Ct_tilda = np.tanh(np.dot(xt, Wc) + np.dot(ht_1, Uc) + bc)
ft = sigmoid(np.dot(xt, Wf) + np.dot(ht_1, Uf) + bf)
Ct = it * Ct_tilda + ft * Ct_1
ot = sigmoid( np.dot(xt, Wo) + np.dot(ht_1, Uo) + bo)
ht = ot * np.tanh(Ct)
ht_list.append(ht)
ht_1 = ht
Ct_1 = Ct
ht = np.array(ht)
return ht
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
Actually extract.
W, U, b = LSTM_AE.layers[1].get_weights()
Ws, Us, bs = split_params(W, U, b)
params = [X_train, latent_dim, Ws, Us, bs]
ht_train = calc_ht(params)
params = [X_test, latent_dim, Ws, Us, bs]
ht_test = calc_ht(params)
LSTM_AE.layers [1] is the encoded LSTM layer. The weight is extracted from it.
ht_train is a matrix of (17802, 10)` ``, which is a 10-dimensional feature. Since the original dimension is 100 dimensions set by
timesteps = 100```, it is compressed from 100 dimensions to 10 dimensions.
Let's plot this 10-dimensional feature for the time being.
class0
class1
I'm not sure. Since it is a periodic signal, it is plotted so that all the movements of the features in one cycle are displayed. The overall behavior is similar, but it seems that it may be different if you look at each one. (Because it is difficult, I will not look at it in detail)
Let's classify class0 and class1 according to the features acquired by this LSTM AutoEncoder. For now, let's set aside the fact that it's not iid, and try logistic regression.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression(n_jobs=-1)
model.fit(ht_train, y_train)
y_hat_test = model.predict(ht_test)
accuracy_score(y_hat_test, y_test)
# 1.0
The accuracy for test data is now 1.0. It's too expensive and I'm worried about it, but it was a signal with a very different waveform.
――I built a model for the first time with Keras, but I'm worried that what I'm actually doing inside is too high layer and what I want to do. Looking only at the results, it is as expected, so I would like to believe that it is done. (Please tell me) ――As with the original paper, how do people in the machine learning community understand and implement the model proposed there because it is not written in mathematical formulas? Is the idea of computational graph widespread and conveyed by diagrams? Or is it a problem if the model cannot be imagined just by reading the paper and there is no reproducibility? I don `t really understand. ――Keras is a rapper of tendorflow, so it's natural, but people like me who have never touched tensorflow run into difficulties, so it may be better to touch tensorflow for the time being.
Recommended Posts