[PYTHON] [PyTorch] Japanese sentence generation using Transformer

Introduction

In this article, we aim to build a model that automatically generates a novel by inputting an arbitrary title using a pair of the title and body of the novel.

Advance preparation

Google Colaboratory
PyDrive
janome
mojimoji

Google Colaboratory is used as the analysis infrastructure. Google Colaboratory is a cloud notebook environment provided by Google, and anyone with a Google account can use it for free. Not only is the library required for data analysis prepared in advance, but it can also be used with the GPU, so it is highly recommended when you want to feel free to try something on your laptop. For details on setting up Google Colaboratory, see this article.

After setup, open the notebook for this analysis and execute the following command to install the library that is not installed in advance.

!pip install PyDrive
!pip install janome
!pip install mojimoji

Data acquisition

The novel data is obtained from Github of Aozora Bunko. First, copy the target repository to your own Google Drive. You can also execute the following commands from Google Colaboratory.

!git clone --branch master --depth 1 https://github.com/aozorabunko/aozorabunko.git "drive/My Drive/Arbitrary directory"

Next, extract the data used for model construction from the copied files and format it. This time, I will use a novel with less than 3,000 characters in the text. In addition, "id on drive" in the code refers to the character string under folders / included in the URL of the target directory.

#----------------------
#Get a list of target files
#----------------------
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import pandas as pd

#Allow access to Google drive
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

#Get the title of all works and id on drive
def get_list_file_recursively(parent_id, l=None):
    if l is None:
        l = []

    file_list = drive.ListFile({'q': '"{}" in parents and trashed = false'.format(parent_id)}).GetList()
    l += [f for f in file_list if f['mimeType'] != 'application/vnd.google-apps.folder']

    for f in file_list:
        if f['mimeType'] == 'application/vnd.google-apps.folder':
            get_list_file_recursively(f['id'], l)

    return l

listed = []
for f in get_list_file_recursively('The id on drive of the top-level directory of the copied repository'):
  print(f['title'])
  if 'html' in f['title'] and 'card' not in f['title']:
    list = [f['title'], f['id']]
    listed.append(list)
listed = pd.DataFrame(listed)

#----------------------
#Acquisition of title / text
#----------------------
from bs4 import BeautifulSoup

#Read the html file of the list
Stories = []
for i in range(0, len(listed)):
    if i % 100 == 0:
        print('{} / {}'.format(i, len(listed)))

    #Identify files from list id
    file_data = drive.CreateFile({'id': listed.iloc[i, 2]})
    file_data.GetContentFile(listed.iloc[i, 1])
    with open(listed.iloc[i, 1], 'rb') as html:
        soup = BeautifulSoup(html, 'lxml')

    #Get title / text
    title = soup.find("h1", class_='title')
    main_text = soup.find("div", class_='main_text')

    #If the title or text is missing, skip it
    if title == None or main_text == None:
        continue

    #Delete ruby
    for yomigana in main_text.find_all(["rp", "h4", "rt"]):
        yomigana.decompose()

    #Format and string
    title = [line.strip() for line in 
    title.text.strip().splitlines()]
    main_text = [line.strip() for line in 
    main_text.text.strip().splitlines()]
    title = ''.join(title)
    text=''.join(main_text)

    #The text is 3,Narrow down to works within 000 characters
    if len(text) <= 3000:
        Stories.append([title, text])

#Save as csv
Stories = pd.DataFrame(Stories)
Stories.to_csv('drive/My Drive/Stories.csv', index=False, header=False)

Finally, save a random split of 80% of the data for training and 20% for testing.

#----------------------
#Data split
#----------------------
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, shuffle=True, random_state=12345)
tr_idx, te_idx = list(kf.split(df))[0]

train = df.iloc[tr_idx, :]
test = df.iloc[te_idx, :]
train.to_csv('drive/My Drive/train.csv', index=False, header=False)
test.to_csv('drive/My Drive/test.csv', index=False, header=False)

Preparation of training data

Use torchtext to read the data. This article is detailed for the basic explanation of torchtext.

Definition of preprocessing

First, define a preprocessing function to be applied when reading data with torchtext. Janome is used for morphological analysis.

#----------------------
#Definition of preprocessing
#----------------------
from torchtext import data
from janome.tokenizer import Tokenizer
import re
import mojimoji

#String processing definition
def preprocessing(text):
    #Remove line breaks, half-width spaces, and full-width spaces
    text = re.sub('\r', '', text)
    text = re.sub('\n', '', text)
    text = re.sub('　', '', text)
    text = re.sub(' ', '', text)
    #Uniform "0" for numeric characters
    text = re.sub(r'[0-9 ０-９]', '0', text) 
    #Full-width
    text = mojimoji.han_to_zen(text)
    return text

#Tokenizer definition
j_t = Tokenizer()
def tokenizer(text):
    return [tok for tok in j_t.tokenize(text, wakati=True)]

#String processing+ Tokenizer
def tokenizer_with_preprocessing(text):
    text = preprocessing(text)
    text = tokenizer(text)
    return text

torchtext settings

Next, set the reading method using torchtext.

#----------------------
#Field definition
#----------------------
TEXT = data.Field(
    sequential=True, 
    init_token='<sos>', 
    eos_token='<eos>', 
    tokenize=tokenizer_with_preprocessing, 
    lower=True, 
    use_vocab=True, 
    batch_first=True
)

Data reading

Read the divided csv file for learning and testing, and create a vocabulary dictionary.

#----------------------
#Data reading
#----------------------
train_ds, test_ds = data.TabularDataset.splits(
    path='drive/My Drive',
    train='train.csv',
    test='test.csv',
    format='csv',
    skip_header=False,
    fields=[('title', TEXT), ('text', TEXT)]
)

#Verification
train_ds[0].__dict__.keys()
test_ds[0].__dict__.keys()
for i in range(0, 10):
    print(vars(train_ds[i]))
    print(vars(test_ds[i]))

#Dictionary creation
TEXT.build_vocab(train_ds, test_ds, min_freq=2)

#Word count
print(TEXT.vocab.freqs)
print('Vocabulary number:{}'.format(len(TEXT.vocab)))

#Creating an iterator
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')　
# -->Select GPU from "Change runtime type" in advance.
train_iter = data.Iterator(train_ds, batch_size=16, shuffle=True, device=device)
test_iter = data.Iterator(test_ds, batch_size=16, shuffle=False, device=device)

#Verification
batch = next(iter(train_iter))
print(batch.title)
print(batch.text)

batch = next(iter(test_iter))
print(batch.title)
print(batch.text)

Model building

Network definition

Implement Transformer. This article does not explain how Transformer works, but in Japanese this article and in English this article .github.io/illustrated-transformer/) is very easy to understand. Also, for the implementation, please refer to here. It was. The meaning of each process is also explained in detail, so please read it before trying.

First, define an Encoder that vectorizes the title of the novel.

import torch
from torch import nn

class Encoder(nn.Module):
    def __init__(self, 
                 input_dim, 
                 hid_dim, 
                 n_layers, 
                 n_heads, 
                 pf_dim,
                 dropout, 
                 device,
                 max_length=100):
        super().__init__()

        self.device = device
        
        self.tok_embedding = nn.Embedding(input_dim, hid_dim)
        self.pos_embedding = nn.Embedding(max_length, hid_dim)
        
        self.layers = nn.ModuleList([EncoderLayer(hid_dim, 
                                                  n_heads, 
                                                  pf_dim,
                                                  dropout, 
                                                  device) 
                                     for _ in range(n_layers)])
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
        
    def forward(self, src, src_mask):
        
        #src = [batch size, src len]
        #src_mask = [batch size, src len]
        
        batch_size = src.shape[0]
        src_len = src.shape[1]
        
        pos = torch.arange(0, src_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
        
        #pos = [batch size, src len]
        
        src = self.dropout((self.tok_embedding(src) * self.scale) + self.pos_embedding(pos))
        
        #src = [batch size, src len, hid dim]
        
        for layer in self.layers:
            src = layer(src, src_mask)
            
        #src = [batch size, src len, hid dim]
            
        return src


class EncoderLayer(nn.Module):
    def __init__(self, 
                 hid_dim, 
                 n_heads, 
                 pf_dim,  
                 dropout, 
                 device):
        super().__init__()
        
        self.layer_norm = nn.LayerNorm(hid_dim)
        self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, pf_dim, dropout)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src, src_mask):
        
        #src = [batch size, src len, hid dim]
        #src_mask = [batch size, src len]
                
        #self attention
        _src, _ = self.self_attention(src, src, src, src_mask)
        
        #dropout, residual connection and layer norm
        src = self.layer_norm(src + self.dropout(_src))
        
        #src = [batch size, src len, hid dim]
        
        #positionwise feedforward
        _src = self.positionwise_feedforward(src)
        
        #dropout, residual and layer norm
        src = self.layer_norm(src + self.dropout(_src))
        
        #src = [batch size, src len, hid dim]
        
        return src


class MultiHeadAttentionLayer(nn.Module):
    def __init__(self, hid_dim, n_heads, dropout, device):
        super().__init__()
        
        assert hid_dim % n_heads == 0
        
        self.hid_dim = hid_dim
        self.n_heads = n_heads
        self.head_dim = hid_dim // n_heads
        
        self.fc_q = nn.Linear(hid_dim, hid_dim)
        self.fc_k = nn.Linear(hid_dim, hid_dim)
        self.fc_v = nn.Linear(hid_dim, hid_dim)
        
        self.fc_o = nn.Linear(hid_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([self.head_dim])).to(device)
        
    def forward(self, query, key, value, mask = None):
        
        batch_size = query.shape[0]
        
        #query = [batch size, query len, hid dim]
        #key = [batch size, key len, hid dim]
        #value = [batch size, value len, hid dim]
                
        Q = self.fc_q(query)
        K = self.fc_k(key)
        V = self.fc_v(value)
        
        #Q = [batch size, query len, hid dim]
        #K = [batch size, key len, hid dim]
        #V = [batch size, value len, hid dim]
                
        Q = Q.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        K = K.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        V = V.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        
        #Q = [batch size, n heads, query len, head dim]
        #K = [batch size, n heads, key len, head dim]
        #V = [batch size, n heads, value len, head dim]
                
        energy = torch.matmul(Q, K.permute(0, 1, 3, 2)) / self.scale
        
        #energy = [batch size, n heads, seq len, seq len]
        
        if mask is not None:
            energy = energy.masked_fill(mask == 0, -1e10)
        
        attention = torch.softmax(energy, dim = -1)
                
        #attention = [batch size, n heads, query len, key len]
        
        x = torch.matmul(self.dropout(attention), V)
        
        #x = [batch size, n heads, seq len, head dim]
        
        x = x.permute(0, 2, 1, 3).contiguous()
        
        #x = [batch size, seq len, n heads, head dim]
        
        x = x.view(batch_size, -1, self.hid_dim)
        
        #x = [batch size, seq len, hid dim]
        
        x = self.fc_o(x)
        
        #x = [batch size, seq len, hid dim]
        
        return x, attention


class PositionwiseFeedforwardLayer(nn.Module):
    def __init__(self, hid_dim, pf_dim, dropout):
        super().__init__()
        
        self.fc_1 = nn.Linear(hid_dim, pf_dim)
        self.fc_2 = nn.Linear(pf_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        
        #x = [batch size, seq len, hid dim]
        
        x = self.dropout(torch.relu(self.fc_1(x)))
        
        #x = [batch size, seq len, pf dim]
        
        x = self.fc_2(x)
        
        #x = [batch size, seq len, hid dim]
        
        return x

Next, we define a Decoder that receives the title vector and generates the body of the novel.

class Decoder(nn.Module):
    def __init__(self, 
                 output_dim, 
                 hid_dim, 
                 n_layers, 
                 n_heads, 
                 pf_dim, 
                 dropout, 
                 device,
                 max_length=1000):
        super().__init__()
        
        self.device = device
        
        self.tok_embedding = nn.Embedding(output_dim, hid_dim)
        self.pos_embedding = nn.Embedding(max_length, hid_dim)
        
        self.layers = nn.ModuleList([DecoderLayer(hid_dim, 
                                                  n_heads, 
                                                  pf_dim, 
                                                  dropout, 
                                                  device)
                                     for _ in range(n_layers)])
        
        self.fc_out = nn.Linear(hid_dim, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
        
    def forward(self, trg, enc_src, trg_mask, src_mask):
        
        #trg = [batch size, trg len]
        #enc_src = [batch size, src len, hid dim]
        #trg_mask = [batch size, trg len]
        #src_mask = [batch size, src len]
                
        batch_size = trg.shape[0]
        trg_len = trg.shape[1]
        
        pos = torch.arange(0, trg_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
                            
        #pos = [batch size, trg len]
            
        trg = self.dropout((self.tok_embedding(trg) * self.scale) + self.pos_embedding(pos))
                
        #trg = [batch size, trg len, hid dim]
        
        for layer in self.layers:
            trg, attention = layer(trg, enc_src, trg_mask, src_mask)
        
        #trg = [batch size, trg len, hid dim]
        #attention = [batch size, n heads, trg len, src len]
        
        output = self.fc_out(trg)
        
        #output = [batch size, trg len, output dim]
            
        return output, attention


class DecoderLayer(nn.Module):
    def __init__(self, 
                 hid_dim, 
                 n_heads, 
                 pf_dim, 
                 dropout, 
                 device):
        super().__init__()
        
        self.layer_norm = nn.LayerNorm(hid_dim)
        self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.encoder_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, pf_dim, dropout)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, trg, enc_src, trg_mask, src_mask):
        
        #trg = [batch size, trg len, hid dim]
        #enc_src = [batch size, src len, hid dim]
        #trg_mask = [batch size, trg len]
        #src_mask = [batch size, src len]
        
        #self attention
        _trg, _ = self.self_attention(trg, trg, trg, trg_mask)
        
        #dropout, residual connection and layer norm
        trg = self.layer_norm(trg + self.dropout(_trg))
            
        #trg = [batch size, trg len, hid dim]
            
        #encoder attention
        _trg, attention = self.encoder_attention(trg, enc_src, enc_src, src_mask)
        
        #dropout, residual connection and layer norm
        trg = self.layer_norm(trg + self.dropout(_trg))
                    
        #trg = [batch size, trg len, hid dim]
        
        #positionwise feedforward
        _trg = self.positionwise_feedforward(trg)
        
        #dropout, residual and layer norm
        trg = self.layer_norm(trg + self.dropout(_trg))
        
        #trg = [batch size, trg len, hid dim]
        #attention = [batch size, n heads, trg len, src len]
        
        return trg, attention

Finally, connect the Encoder and Decoder to complete the Transformer.

class Seq2Seq(nn.Module):
    def __init__(self, 
                 encoder, 
                 decoder, 
                 src_pad_idx, 
                 trg_pad_idx, 
                 device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.src_pad_idx = src_pad_idx
        self.trg_pad_idx = trg_pad_idx
        self.device = device
        
    def make_src_mask(self, src):
        
        #src = [batch size, src len]
        
        src_mask = (src != self.src_pad_idx).unsqueeze(1).unsqueeze(2)

        #src_mask = [batch size, 1, 1, src len]

        return src_mask
    
    def make_trg_mask(self, trg):
        
        #trg = [batch size, trg len]
        
        trg_pad_mask = (trg != self.trg_pad_idx).unsqueeze(1).unsqueeze(3)
        
        #trg_pad_mask = [batch size, 1, trg len, 1]
        
        trg_len = trg.shape[1]
        
        trg_sub_mask = torch.tril(torch.ones((trg_len, trg_len), device = self.device)).bool()
        
        #trg_sub_mask = [trg len, trg len]
            
        trg_mask = trg_pad_mask & trg_sub_mask
        
        #trg_mask = [batch size, 1, trg len, trg len]
        
        return trg_mask

    def forward(self, src, trg):
        
        #src = [batch size, src len]
        #trg = [batch size, trg len]
                
        src_mask = self.make_src_mask(src)
        trg_mask = self.make_trg_mask(trg)
        
        #src_mask = [batch size, 1, 1, src len]
        #trg_mask = [batch size, 1, trg len, trg len]
        
        enc_src = self.encoder(src, src_mask)
        
        #enc_src = [batch size, src len, hid dim]
                
        output, attention = self.decoder(trg, enc_src, trg_mask, src_mask)
        
        #output = [batch size, trg len, output dim]
        #attention = [batch size, n heads, trg len, src len]
        
        return output, attention

Although not done this time, it is possible to visualize the attention weight after learning by using the attention of the return value.

Model learning

Learn the model. In the reference script, the epoch is stopped by looking at the accuracy of the validation data, but in this trial, the accuracy of the validation data decreased as learning progressed from the first epoch, so overfitting was ignored and the model after the final epoch. Is adopted.

#----------------------
#Preparation for learning
#----------------------
#Parameter setting
INPUT_DIM = len(TEXT.vocab)
OUTPUT_DIM = len(TEXT.vocab)
HID_DIM = 256
ENC_LAYERS = 3
DEC_LAYERS = 3
ENC_HEADS = 8
DEC_HEADS = 8
ENC_PF_DIM = 512
DEC_PF_DIM = 512
ENC_DROPOUT = 0.1
DEC_DROPOUT = 0.1

#Encoder initialization
enc = Encoder(INPUT_DIM, 
              HID_DIM, 
              ENC_LAYERS, 
              ENC_HEADS, 
              ENC_PF_DIM, 
              ENC_DROPOUT, 
              device)

#Decoder initialization
dec = Decoder(OUTPUT_DIM, 
              HID_DIM, 
              DEC_LAYERS, 
              DEC_HEADS, 
              DEC_PF_DIM, 
              DEC_DROPOUT, 
              device)

#Specifying the ID for padding
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

#Model initialization
model = Seq2Seq(enc, dec, PAD_IDX, PAD_IDX, device).to(device)

#Weight initialization
def initialize_weights(m):
    if hasattr(m, 'weight') and m.weight.dim() > 1:
        nn.init.xavier_uniform_(m.weight.data)
model.apply(initialize_weights)

#Optimizer settings
LEARNING_RATE = 0.0005
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

#Loss function settings
criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)

#Definition of learning function
def train(model, iterator, optimizer, criterion, clip):
    
    model.train()
    
    epoch_loss = 0
    
    for i, batch in enumerate(iterator):
        
        src = batch.title
        trg = batch.text
        
        optimizer.zero_grad()
        
        output, _ = model(src, trg[:,:-1])
                
        #output = [batch size, trg len - 1, output dim]
        #trg = [batch size, trg len]
            
        output_dim = output.shape[-1]
            
        output = output.contiguous().view(-1, output_dim)
        trg = trg[:,1:].contiguous().view(-1)
                
        #output = [batch size * trg len - 1, output dim]
        #trg = [batch size * trg len - 1]
            
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        
        optimizer.step()
        
        epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

#Definition of evaluation function
def evaluate(model, iterator, criterion):
    
    model.eval()
    
    epoch_loss = 0
    
    with torch.no_grad():
    
        for i, batch in enumerate(iterator):

            src = batch.title
            trg = batch.text

            output, _ = model(src, trg[:,:-1])
            
            #output = [batch size, trg len - 1, output dim]
            #trg = [batch size, trg len]
            
            output_dim = output.shape[-1]
            
            output = output.contiguous().view(-1, output_dim)
            trg = trg[:,1:].contiguous().view(-1)
            
            #output = [batch size * trg len - 1, output dim]
            #trg = [batch size * trg len - 1]
            
            loss = criterion(output, trg)

            epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

#Definition of function for measuring processing time
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

#Definition of sentence generation function
def translate_sentence(sentence, src_field, trg_field, model, device, max_len=1000):
    
    model.eval()
        
    tokens = [token.lower() for token in sentence]

    tokens = [src_field.init_token] + tokens + [src_field.eos_token]
        
    src_indexes = [src_field.vocab.stoi[token] for token in tokens]

    src_tensor = torch.LongTensor(src_indexes).unsqueeze(0).to(device)
    
    src_mask = model.make_src_mask(src_tensor)
    
    with torch.no_grad():
        enc_src = model.encoder(src_tensor, src_mask)

    trg_indexes = [trg_field.vocab.stoi[trg_field.init_token]]

    for i in range(max_len):

        trg_tensor = torch.LongTensor(trg_indexes).unsqueeze(0).to(device)

        trg_mask = model.make_trg_mask(trg_tensor)
        
        with torch.no_grad():
            output, attention = model.decoder(trg_tensor, enc_src, trg_mask, src_mask)
        
        pred_token = output.argmax(2)[:,-1].item()
        
        trg_indexes.append(pred_token)

        if pred_token == trg_field.vocab.stoi[trg_field.eos_token]:
            break
    
    trg_tokens = [trg_field.vocab.itos[i] for i in trg_indexes]
    
    return trg_tokens[1:], attention


#----------------------
#Model learning
#----------------------
import time
import math

N_EPOCHS = 100
CLIP = 1

#Get 1 sample work
example_idx = 8
src_sample = vars(train_ds.examples[example_idx])['title']
trg_sample = vars(train_ds.examples[example_idx])['text']

#Show title and body
print(f'src = {src_sample}')
print(f'trg = {trg_sample}')

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
    
    start_time = time.time()
    
    train_loss = train(model, train_iter, optimizer, criterion, CLIP)
    valid_loss = evaluate(model, test_iter, criterion)
    
    end_time = time.time()
    
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    #if valid_loss < best_valid_loss:
    #    best_valid_loss = valid_loss
    #    torch.save(model.state_dict(), 'drive/My Drive/trained_model.pt')
    #In the reference script, the accuracy of the validation data is used as an index, but in this trial, the accuracy of the validation data decreased as the learning progressed, so overfitting was ignored and the model after the final epoch was adopted (considering the end of the process). Save after each epoch)
    torch.save(model.state_dict(), 'drive/My Drive/trained_model.pt')

    #Display the accuracy of learning / validation data for each epoch
    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}')

    #Display the result of generating the text from the title of one sample work for every 10 epochs
    if epoch % 10 == 0:
      translation, attention = translate_sentence(src_sample, TEXT, TEXT, model, device)
      print(f'predicted trg = {translation}')

test

You can generate a novel by entering your favorite title below.

translation, attention = translate_sentence(['Any title'], TEXT, TEXT, model, device)
print(f'predicted trg = {translation}')

I tried some. The top two are titles that are included in the training data, and the bottom two are titles that are not included (to be exact, they are included in the dictionary (= twice in the title or body of the training data or test data). Of the word combinations (which appear above), those that are not the titles of the learning data).

Input title	Generated text
memories	When I was in my twenties, I met Ogai-sensei five or six times. Then a couple of times from the Ministry of the Army(unk)I saw you when I delivered the proofreading to the station, but they were all easy.(unk)I don't have enough material to tell the story. At that time, Hiroshi Yosano, Nagae Ikuta, Kafu Nagai and others were the students of Professor Kogai, and I was like the grandson of the teacher. Because of that, I have no direct relationship with the teacher, but I respect him for his literary work. It seems that the teacher was always careful not to make the other person feel cramped, suffering from what seems to be a difficult person, but it was rather cramped here. Speaking of thoughts, one day, to celebrate the publication of a magazine called "We" ... (Omitted below)
Footprints	ずつと昔のこと一匹の狐が河岸の粘土層を走つていつたそれから何万年かたつたあとにその粘土層が化石となつてFootprintsが残つたそのFootprintsを見ると、むかし狐が何を考えて走つていつたのかがわかる
hair	Ah, you're emotional, but I'm melancholy, Segawa's 螢,(unk)If you look at the light of the puffer fish, you can see the puffer fish in the eyes of the puffer fish.(unk), Kuchi(unk)
musics	(unk)Is(unk)Is(unk)太陽が落ちて太陽の世界が始つた［＃「始つた」Is底本でIs「始まつた」］(unk)Is戸袋(unk)Is(unk)The sun has risen and the night world has begun(unk)Is妖怪下痢Is(unk)Higurashi drew a diameter and the world of Dada began (it(unk)Look at it and Christ impresses it

For learning data, it seems that the text can be reproduced with high accuracy from the title. On the other hand, titles that are not included in the learning data have become meaningless sentences.

in conclusion

It was confirmed that the text can be generated from the title of the novel by using Transformer. However, I couldn't generate a decent title for the title that is not in the training data, so next time I would like to utilize the validation data that I abandoned this time and aim for a more general-purpose model.

reference

-Introduction to Python without the need to build an environment! Easy-to-understand explanation of how to use Google Colaboratory -Easy and deep natural language processing with torchtext -Make and understand Transformer / Attention