In this article, we aim to build a model that automatically generates a novel by inputting an arbitrary title using a pair of the title and body of the novel.
Google Colaboratory is used as the analysis infrastructure. Google Colaboratory is a cloud notebook environment provided by Google, and anyone with a Google account can use it for free. Not only is the library required for data analysis prepared in advance, but it can also be used with the GPU, so it is highly recommended when you want to feel free to try something on your laptop. For details on setting up Google Colaboratory, see this article.
After setup, open the notebook for this analysis and execute the following command to install the library that is not installed in advance.
!pip install PyDrive
!pip install janome
!pip install mojimoji
The novel data is obtained from Github of Aozora Bunko. First, copy the target repository to your own Google Drive. You can also execute the following commands from Google Colaboratory.
!git clone --branch master --depth 1 https://github.com/aozorabunko/aozorabunko.git "drive/My Drive/Arbitrary directory"
Next, extract the data used for model construction from the copied files and format it. This time, I will use a novel with less than 3,000 characters in the text. In addition, "id on drive" in the code refers to the character string under folders / included in the URL of the target directory.
#----------------------
#Get a list of target files
#----------------------
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import pandas as pd
#Allow access to Google drive
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#Get the title of all works and id on drive
def get_list_file_recursively(parent_id, l=None):
if l is None:
l = []
file_list = drive.ListFile({'q': '"{}" in parents and trashed = false'.format(parent_id)}).GetList()
l += [f for f in file_list if f['mimeType'] != 'application/vnd.google-apps.folder']
for f in file_list:
if f['mimeType'] == 'application/vnd.google-apps.folder':
get_list_file_recursively(f['id'], l)
return l
listed = []
for f in get_list_file_recursively('The id on drive of the top-level directory of the copied repository'):
print(f['title'])
if 'html' in f['title'] and 'card' not in f['title']:
list = [f['title'], f['id']]
listed.append(list)
listed = pd.DataFrame(listed)
#----------------------
#Acquisition of title / text
#----------------------
from bs4 import BeautifulSoup
#Read the html file of the list
Stories = []
for i in range(0, len(listed)):
if i % 100 == 0:
print('{} / {}'.format(i, len(listed)))
#Identify files from list id
file_data = drive.CreateFile({'id': listed.iloc[i, 2]})
file_data.GetContentFile(listed.iloc[i, 1])
with open(listed.iloc[i, 1], 'rb') as html:
soup = BeautifulSoup(html, 'lxml')
#Get title / text
title = soup.find("h1", class_='title')
main_text = soup.find("div", class_='main_text')
#If the title or text is missing, skip it
if title == None or main_text == None:
continue
#Delete ruby
for yomigana in main_text.find_all(["rp", "h4", "rt"]):
yomigana.decompose()
#Format and string
title = [line.strip() for line in
title.text.strip().splitlines()]
main_text = [line.strip() for line in
main_text.text.strip().splitlines()]
title = ''.join(title)
text=''.join(main_text)
#The text is 3,Narrow down to works within 000 characters
if len(text) <= 3000:
Stories.append([title, text])
#Save as csv
Stories = pd.DataFrame(Stories)
Stories.to_csv('drive/My Drive/Stories.csv', index=False, header=False)
Finally, save a random split of 80% of the data for training and 20% for testing.
#----------------------
#Data split
#----------------------
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, shuffle=True, random_state=12345)
tr_idx, te_idx = list(kf.split(df))[0]
train = df.iloc[tr_idx, :]
test = df.iloc[te_idx, :]
train.to_csv('drive/My Drive/train.csv', index=False, header=False)
test.to_csv('drive/My Drive/test.csv', index=False, header=False)
Use torchtext to read the data. This article is detailed for the basic explanation of torchtext.
First, define a preprocessing function to be applied when reading data with torchtext. Janome is used for morphological analysis.
#----------------------
#Definition of preprocessing
#----------------------
from torchtext import data
from janome.tokenizer import Tokenizer
import re
import mojimoji
#String processing definition
def preprocessing(text):
#Remove line breaks, half-width spaces, and full-width spaces
text = re.sub('\r', '', text)
text = re.sub('\n', '', text)
text = re.sub(' ', '', text)
text = re.sub(' ', '', text)
#Uniform "0" for numeric characters
text = re.sub(r'[0-9 0-9]', '0', text)
#Full-width
text = mojimoji.han_to_zen(text)
return text
#Tokenizer definition
j_t = Tokenizer()
def tokenizer(text):
return [tok for tok in j_t.tokenize(text, wakati=True)]
#String processing+ Tokenizer
def tokenizer_with_preprocessing(text):
text = preprocessing(text)
text = tokenizer(text)
return text
Next, set the reading method using torchtext.
#----------------------
#Field definition
#----------------------
TEXT = data.Field(
sequential=True,
init_token='<sos>',
eos_token='<eos>',
tokenize=tokenizer_with_preprocessing,
lower=True,
use_vocab=True,
batch_first=True
)
Read the divided csv file for learning and testing, and create a vocabulary dictionary.
#----------------------
#Data reading
#----------------------
train_ds, test_ds = data.TabularDataset.splits(
path='drive/My Drive',
train='train.csv',
test='test.csv',
format='csv',
skip_header=False,
fields=[('title', TEXT), ('text', TEXT)]
)
#Verification
train_ds[0].__dict__.keys()
test_ds[0].__dict__.keys()
for i in range(0, 10):
print(vars(train_ds[i]))
print(vars(test_ds[i]))
#Dictionary creation
TEXT.build_vocab(train_ds, test_ds, min_freq=2)
#Word count
print(TEXT.vocab.freqs)
print('Vocabulary number:{}'.format(len(TEXT.vocab)))
#Creating an iterator
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# -->Select GPU from "Change runtime type" in advance.
train_iter = data.Iterator(train_ds, batch_size=16, shuffle=True, device=device)
test_iter = data.Iterator(test_ds, batch_size=16, shuffle=False, device=device)
#Verification
batch = next(iter(train_iter))
print(batch.title)
print(batch.text)
batch = next(iter(test_iter))
print(batch.title)
print(batch.text)
Implement Transformer. This article does not explain how Transformer works, but in Japanese this article and in English this article .github.io/illustrated-transformer/) is very easy to understand. Also, for the implementation, please refer to here. It was. The meaning of each process is also explained in detail, so please read it before trying.
First, define an Encoder that vectorizes the title of the novel.
import torch
from torch import nn
class Encoder(nn.Module):
def __init__(self,
input_dim,
hid_dim,
n_layers,
n_heads,
pf_dim,
dropout,
device,
max_length=100):
super().__init__()
self.device = device
self.tok_embedding = nn.Embedding(input_dim, hid_dim)
self.pos_embedding = nn.Embedding(max_length, hid_dim)
self.layers = nn.ModuleList([EncoderLayer(hid_dim,
n_heads,
pf_dim,
dropout,
device)
for _ in range(n_layers)])
self.dropout = nn.Dropout(dropout)
self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
def forward(self, src, src_mask):
#src = [batch size, src len]
#src_mask = [batch size, src len]
batch_size = src.shape[0]
src_len = src.shape[1]
pos = torch.arange(0, src_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
#pos = [batch size, src len]
src = self.dropout((self.tok_embedding(src) * self.scale) + self.pos_embedding(pos))
#src = [batch size, src len, hid dim]
for layer in self.layers:
src = layer(src, src_mask)
#src = [batch size, src len, hid dim]
return src
class EncoderLayer(nn.Module):
def __init__(self,
hid_dim,
n_heads,
pf_dim,
dropout,
device):
super().__init__()
self.layer_norm = nn.LayerNorm(hid_dim)
self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, pf_dim, dropout)
self.dropout = nn.Dropout(dropout)
def forward(self, src, src_mask):
#src = [batch size, src len, hid dim]
#src_mask = [batch size, src len]
#self attention
_src, _ = self.self_attention(src, src, src, src_mask)
#dropout, residual connection and layer norm
src = self.layer_norm(src + self.dropout(_src))
#src = [batch size, src len, hid dim]
#positionwise feedforward
_src = self.positionwise_feedforward(src)
#dropout, residual and layer norm
src = self.layer_norm(src + self.dropout(_src))
#src = [batch size, src len, hid dim]
return src
class MultiHeadAttentionLayer(nn.Module):
def __init__(self, hid_dim, n_heads, dropout, device):
super().__init__()
assert hid_dim % n_heads == 0
self.hid_dim = hid_dim
self.n_heads = n_heads
self.head_dim = hid_dim // n_heads
self.fc_q = nn.Linear(hid_dim, hid_dim)
self.fc_k = nn.Linear(hid_dim, hid_dim)
self.fc_v = nn.Linear(hid_dim, hid_dim)
self.fc_o = nn.Linear(hid_dim, hid_dim)
self.dropout = nn.Dropout(dropout)
self.scale = torch.sqrt(torch.FloatTensor([self.head_dim])).to(device)
def forward(self, query, key, value, mask = None):
batch_size = query.shape[0]
#query = [batch size, query len, hid dim]
#key = [batch size, key len, hid dim]
#value = [batch size, value len, hid dim]
Q = self.fc_q(query)
K = self.fc_k(key)
V = self.fc_v(value)
#Q = [batch size, query len, hid dim]
#K = [batch size, key len, hid dim]
#V = [batch size, value len, hid dim]
Q = Q.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
K = K.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
V = V.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
#Q = [batch size, n heads, query len, head dim]
#K = [batch size, n heads, key len, head dim]
#V = [batch size, n heads, value len, head dim]
energy = torch.matmul(Q, K.permute(0, 1, 3, 2)) / self.scale
#energy = [batch size, n heads, seq len, seq len]
if mask is not None:
energy = energy.masked_fill(mask == 0, -1e10)
attention = torch.softmax(energy, dim = -1)
#attention = [batch size, n heads, query len, key len]
x = torch.matmul(self.dropout(attention), V)
#x = [batch size, n heads, seq len, head dim]
x = x.permute(0, 2, 1, 3).contiguous()
#x = [batch size, seq len, n heads, head dim]
x = x.view(batch_size, -1, self.hid_dim)
#x = [batch size, seq len, hid dim]
x = self.fc_o(x)
#x = [batch size, seq len, hid dim]
return x, attention
class PositionwiseFeedforwardLayer(nn.Module):
def __init__(self, hid_dim, pf_dim, dropout):
super().__init__()
self.fc_1 = nn.Linear(hid_dim, pf_dim)
self.fc_2 = nn.Linear(pf_dim, hid_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
#x = [batch size, seq len, hid dim]
x = self.dropout(torch.relu(self.fc_1(x)))
#x = [batch size, seq len, pf dim]
x = self.fc_2(x)
#x = [batch size, seq len, hid dim]
return x
Next, we define a Decoder that receives the title vector and generates the body of the novel.
class Decoder(nn.Module):
def __init__(self,
output_dim,
hid_dim,
n_layers,
n_heads,
pf_dim,
dropout,
device,
max_length=1000):
super().__init__()
self.device = device
self.tok_embedding = nn.Embedding(output_dim, hid_dim)
self.pos_embedding = nn.Embedding(max_length, hid_dim)
self.layers = nn.ModuleList([DecoderLayer(hid_dim,
n_heads,
pf_dim,
dropout,
device)
for _ in range(n_layers)])
self.fc_out = nn.Linear(hid_dim, output_dim)
self.dropout = nn.Dropout(dropout)
self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
def forward(self, trg, enc_src, trg_mask, src_mask):
#trg = [batch size, trg len]
#enc_src = [batch size, src len, hid dim]
#trg_mask = [batch size, trg len]
#src_mask = [batch size, src len]
batch_size = trg.shape[0]
trg_len = trg.shape[1]
pos = torch.arange(0, trg_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
#pos = [batch size, trg len]
trg = self.dropout((self.tok_embedding(trg) * self.scale) + self.pos_embedding(pos))
#trg = [batch size, trg len, hid dim]
for layer in self.layers:
trg, attention = layer(trg, enc_src, trg_mask, src_mask)
#trg = [batch size, trg len, hid dim]
#attention = [batch size, n heads, trg len, src len]
output = self.fc_out(trg)
#output = [batch size, trg len, output dim]
return output, attention
class DecoderLayer(nn.Module):
def __init__(self,
hid_dim,
n_heads,
pf_dim,
dropout,
device):
super().__init__()
self.layer_norm = nn.LayerNorm(hid_dim)
self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
self.encoder_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, pf_dim, dropout)
self.dropout = nn.Dropout(dropout)
def forward(self, trg, enc_src, trg_mask, src_mask):
#trg = [batch size, trg len, hid dim]
#enc_src = [batch size, src len, hid dim]
#trg_mask = [batch size, trg len]
#src_mask = [batch size, src len]
#self attention
_trg, _ = self.self_attention(trg, trg, trg, trg_mask)
#dropout, residual connection and layer norm
trg = self.layer_norm(trg + self.dropout(_trg))
#trg = [batch size, trg len, hid dim]
#encoder attention
_trg, attention = self.encoder_attention(trg, enc_src, enc_src, src_mask)
#dropout, residual connection and layer norm
trg = self.layer_norm(trg + self.dropout(_trg))
#trg = [batch size, trg len, hid dim]
#positionwise feedforward
_trg = self.positionwise_feedforward(trg)
#dropout, residual and layer norm
trg = self.layer_norm(trg + self.dropout(_trg))
#trg = [batch size, trg len, hid dim]
#attention = [batch size, n heads, trg len, src len]
return trg, attention
Finally, connect the Encoder and Decoder to complete the Transformer.
class Seq2Seq(nn.Module):
def __init__(self,
encoder,
decoder,
src_pad_idx,
trg_pad_idx,
device):
super().__init__()
self.encoder = encoder
self.decoder = decoder
self.src_pad_idx = src_pad_idx
self.trg_pad_idx = trg_pad_idx
self.device = device
def make_src_mask(self, src):
#src = [batch size, src len]
src_mask = (src != self.src_pad_idx).unsqueeze(1).unsqueeze(2)
#src_mask = [batch size, 1, 1, src len]
return src_mask
def make_trg_mask(self, trg):
#trg = [batch size, trg len]
trg_pad_mask = (trg != self.trg_pad_idx).unsqueeze(1).unsqueeze(3)
#trg_pad_mask = [batch size, 1, trg len, 1]
trg_len = trg.shape[1]
trg_sub_mask = torch.tril(torch.ones((trg_len, trg_len), device = self.device)).bool()
#trg_sub_mask = [trg len, trg len]
trg_mask = trg_pad_mask & trg_sub_mask
#trg_mask = [batch size, 1, trg len, trg len]
return trg_mask
def forward(self, src, trg):
#src = [batch size, src len]
#trg = [batch size, trg len]
src_mask = self.make_src_mask(src)
trg_mask = self.make_trg_mask(trg)
#src_mask = [batch size, 1, 1, src len]
#trg_mask = [batch size, 1, trg len, trg len]
enc_src = self.encoder(src, src_mask)
#enc_src = [batch size, src len, hid dim]
output, attention = self.decoder(trg, enc_src, trg_mask, src_mask)
#output = [batch size, trg len, output dim]
#attention = [batch size, n heads, trg len, src len]
return output, attention
Although not done this time, it is possible to visualize the attention weight after learning by using the attention of the return value.
Learn the model. In the reference script, the epoch is stopped by looking at the accuracy of the validation data, but in this trial, the accuracy of the validation data decreased as learning progressed from the first epoch, so overfitting was ignored and the model after the final epoch. Is adopted.
#----------------------
#Preparation for learning
#----------------------
#Parameter setting
INPUT_DIM = len(TEXT.vocab)
OUTPUT_DIM = len(TEXT.vocab)
HID_DIM = 256
ENC_LAYERS = 3
DEC_LAYERS = 3
ENC_HEADS = 8
DEC_HEADS = 8
ENC_PF_DIM = 512
DEC_PF_DIM = 512
ENC_DROPOUT = 0.1
DEC_DROPOUT = 0.1
#Encoder initialization
enc = Encoder(INPUT_DIM,
HID_DIM,
ENC_LAYERS,
ENC_HEADS,
ENC_PF_DIM,
ENC_DROPOUT,
device)
#Decoder initialization
dec = Decoder(OUTPUT_DIM,
HID_DIM,
DEC_LAYERS,
DEC_HEADS,
DEC_PF_DIM,
DEC_DROPOUT,
device)
#Specifying the ID for padding
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
#Model initialization
model = Seq2Seq(enc, dec, PAD_IDX, PAD_IDX, device).to(device)
#Weight initialization
def initialize_weights(m):
if hasattr(m, 'weight') and m.weight.dim() > 1:
nn.init.xavier_uniform_(m.weight.data)
model.apply(initialize_weights)
#Optimizer settings
LEARNING_RATE = 0.0005
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
#Loss function settings
criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)
#Definition of learning function
def train(model, iterator, optimizer, criterion, clip):
model.train()
epoch_loss = 0
for i, batch in enumerate(iterator):
src = batch.title
trg = batch.text
optimizer.zero_grad()
output, _ = model(src, trg[:,:-1])
#output = [batch size, trg len - 1, output dim]
#trg = [batch size, trg len]
output_dim = output.shape[-1]
output = output.contiguous().view(-1, output_dim)
trg = trg[:,1:].contiguous().view(-1)
#output = [batch size * trg len - 1, output dim]
#trg = [batch size * trg len - 1]
loss = criterion(output, trg)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
optimizer.step()
epoch_loss += loss.item()
return epoch_loss / len(iterator)
#Definition of evaluation function
def evaluate(model, iterator, criterion):
model.eval()
epoch_loss = 0
with torch.no_grad():
for i, batch in enumerate(iterator):
src = batch.title
trg = batch.text
output, _ = model(src, trg[:,:-1])
#output = [batch size, trg len - 1, output dim]
#trg = [batch size, trg len]
output_dim = output.shape[-1]
output = output.contiguous().view(-1, output_dim)
trg = trg[:,1:].contiguous().view(-1)
#output = [batch size * trg len - 1, output dim]
#trg = [batch size * trg len - 1]
loss = criterion(output, trg)
epoch_loss += loss.item()
return epoch_loss / len(iterator)
#Definition of function for measuring processing time
def epoch_time(start_time, end_time):
elapsed_time = end_time - start_time
elapsed_mins = int(elapsed_time / 60)
elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
return elapsed_mins, elapsed_secs
#Definition of sentence generation function
def translate_sentence(sentence, src_field, trg_field, model, device, max_len=1000):
model.eval()
tokens = [token.lower() for token in sentence]
tokens = [src_field.init_token] + tokens + [src_field.eos_token]
src_indexes = [src_field.vocab.stoi[token] for token in tokens]
src_tensor = torch.LongTensor(src_indexes).unsqueeze(0).to(device)
src_mask = model.make_src_mask(src_tensor)
with torch.no_grad():
enc_src = model.encoder(src_tensor, src_mask)
trg_indexes = [trg_field.vocab.stoi[trg_field.init_token]]
for i in range(max_len):
trg_tensor = torch.LongTensor(trg_indexes).unsqueeze(0).to(device)
trg_mask = model.make_trg_mask(trg_tensor)
with torch.no_grad():
output, attention = model.decoder(trg_tensor, enc_src, trg_mask, src_mask)
pred_token = output.argmax(2)[:,-1].item()
trg_indexes.append(pred_token)
if pred_token == trg_field.vocab.stoi[trg_field.eos_token]:
break
trg_tokens = [trg_field.vocab.itos[i] for i in trg_indexes]
return trg_tokens[1:], attention
#----------------------
#Model learning
#----------------------
import time
import math
N_EPOCHS = 100
CLIP = 1
#Get 1 sample work
example_idx = 8
src_sample = vars(train_ds.examples[example_idx])['title']
trg_sample = vars(train_ds.examples[example_idx])['text']
#Show title and body
print(f'src = {src_sample}')
print(f'trg = {trg_sample}')
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
start_time = time.time()
train_loss = train(model, train_iter, optimizer, criterion, CLIP)
valid_loss = evaluate(model, test_iter, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
#if valid_loss < best_valid_loss:
# best_valid_loss = valid_loss
# torch.save(model.state_dict(), 'drive/My Drive/trained_model.pt')
#In the reference script, the accuracy of the validation data is used as an index, but in this trial, the accuracy of the validation data decreased as the learning progressed, so overfitting was ignored and the model after the final epoch was adopted (considering the end of the process). Save after each epoch)
torch.save(model.state_dict(), 'drive/My Drive/trained_model.pt')
#Display the accuracy of learning / validation data for each epoch
print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
print(f'\t Val. Loss: {valid_loss:.3f} | Val. PPL: {math.exp(valid_loss):7.3f}')
#Display the result of generating the text from the title of one sample work for every 10 epochs
if epoch % 10 == 0:
translation, attention = translate_sentence(src_sample, TEXT, TEXT, model, device)
print(f'predicted trg = {translation}')
You can generate a novel by entering your favorite title below.
translation, attention = translate_sentence(['Any title'], TEXT, TEXT, model, device)
print(f'predicted trg = {translation}')
I tried some. The top two are titles that are included in the training data, and the bottom two are titles that are not included (to be exact, they are included in the dictionary (= twice in the title or body of the training data or test data). Of the word combinations (which appear above), those that are not the titles of the learning data).
Input title | Generated text |
---|---|
memories | When I was in my twenties, I met Ogai-sensei five or six times. Then a couple of times from the Ministry of the Army(unk)I saw you when I delivered the proofreading to the station, but they were all easy.(unk)I don't have enough material to tell the story. At that time, Hiroshi Yosano, Nagae Ikuta, Kafu Nagai and others were the students of Professor Kogai, and I was like the grandson of the teacher. Because of that, I have no direct relationship with the teacher, but I respect him for his literary work. It seems that the teacher was always careful not to make the other person feel cramped, suffering from what seems to be a difficult person, but it was rather cramped here. Speaking of thoughts, one day, to celebrate the publication of a magazine called "We" ... (Omitted below) |
Footprints | ずつと昔のこと一匹の狐が河岸の粘土層を走つていつたそれから何万年かたつたあとにその粘土層が化石となつてFootprintsが残つたそのFootprintsを見ると、むかし狐が何を考えて走つていつたのかがわかる |
hair | Ah, you're emotional, but I'm melancholy, Segawa's 螢,(unk)If you look at the light of the puffer fish, you can see the puffer fish in the eyes of the puffer fish.(unk), Kuchi(unk) |
musics | (unk)Is(unk)Is(unk)太陽が落ちて太陽の世界が始つた[#「始つた」Is底本でIs「始まつた」](unk)Is戸袋(unk)Is(unk)The sun has risen and the night world has begun(unk)Is妖怪下痢Is(unk)Higurashi drew a diameter and the world of Dada began (it(unk)Look at it and Christ impresses it |
For learning data, it seems that the text can be reproduced with high accuracy from the title. On the other hand, titles that are not included in the learning data have become meaningless sentences.
It was confirmed that the text can be generated from the title of the novel by using Transformer. However, I couldn't generate a decent title for the title that is not in the training data, so next time I would like to utilize the validation data that I abandoned this time and aim for a more general-purpose model.
-Introduction to Python without the need to build an environment! Easy-to-understand explanation of how to use Google Colaboratory -Easy and deep natural language processing with torchtext -Make and understand Transformer / Attention
Recommended Posts