[PYTHON] Implementation of dialogue system using Chainer [seq2seq]

Introduction

Recently, the progress of DNNs (Deep Neural Networks) has been remarkable and has been successful in all fields. We often hear about image classification and speech recognition, and dialogue systems are no exception. Now that the Python library environment is being enhanced, I would like to briefly introduce the construction of an interactive system using DNNs.

DNNs model for dialogue systems

There are two main models of DNNs for building a dialogue system.

--Ranking learning for a large number of response candidates-> Select response candidate sentences as they are for input --Learn the Encoder-Decoder model from the utterance and response pairs. -> Response utterance generation for each word to input

This article deals with the latter Encoder-Decoder model. Thanks to the rich library such as Chainer, anyone can implement it as long as there is data that is a pair of utterance and response.

Implementation environment

The dependent packages are summarized below. You can build the environment in one shot with the pip install or conda install command.

seq2seq The model described in this article is called seq2seq (Sequence to Sequence), and is constructed from two types of networks: an encoder RNN (Reccurent Neural Network) for input and a decoder RNN for output (the RNN part is general). Is implemented using LSTM).

Original paper: Sequence to Sequence Learning with Neural Networks

Kobito.DW0UK9.png

When applied to a dialogue system, the input utterance is passed through the Encoder, and the response to it is learned word by word with the Decoder.

Training data

This time, we will learn using the data of Dialogue Failure Detection Challenge 2. The Dialogue Failure Detection Corpus is a corpus that anyone can use for any purpose, so you can use it with confidence.

Dialogue Failure Detection Corpus:

URL: https://sites.google.com/site/dialoguebreakdowndetection2/downloads

The contents of the corpus are constructed with json files. In addition, a script that outputs the dialogue stored in the json file in an easy-to-read manner is also included. Try to execute as follows.

show_dial.py


$ python show_dial.py 1470622453.log.json 

Execution result:

dialogue-id : 1470622453 speaker-id : DBD-01 group-id : S: Hello. Watch out for heat stroke. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O U: Yes. Thank you. You should be careful too. S: Don't you be careful about heat stroke? T O T X X T X X X T O O T T T X X O X X X X X X T T O O T O ... (Omitted) S: Exercise is a change of pace, isn't it? I feel okay T T X O T O O O T T T O X O O X O O T O O T X T T T T T T T

Construction of training data

Using the above dialogue failure detection corpus, seq2seq is learned using the user utterance (U) as the input utterance for learning and the system utterance at that time as the response utterance.

Kobito.mTPH4G.png

As shown in the above figure, a file with one-to-one utterance (Utterance) and response (Response) is created and given as training data. Use the script below to convert the json file into a text file that will be the learning data.

json2text.py


#!/usr/bin/env python                                                                                                                                                    
# -*- coding: utf-8 -*-                                                                                                                                                  

import sys
import os
import json


def loadingJson(dirpath, f):

    fpath = dirpath + '/' + f
    fj = open(fpath,'r')
    json_data = json.load(fj)
    fj.close()

    return json_data

def output(data, mod):

    for i in range(len(data['turns'])):
        if mod == "U" and data['turns'][i]['speaker'] == mod:
            print data['turns'][i]['utterance'].encode('utf-8')
        elif mod == "S" and data['turns'][i]['speaker'] == mod and i != 0:
            print data['turns'][i]['utterance'].encode('utf-8')
        else:
            continue


if __name__ == "__main__":

    argvs = sys.argv

    _usage = """--                                                                                                                                                       
Usage:                                                                                                                                                                   
    python json2text.py [json] [speaker]                                                                                                                                 
Args:                                                                                                                                                                    
    [json]: The argument is input directory that is contained files of json that is objective to convert to sql.                                                         
    [speaker]: The argument is "U" or "S" that is speaker in dialogue.                                                                                                   
""".rstrip()

    if len(argvs) < 3:
        print _usage
        sys.exit(0)

    # one file ver                                                                                                                                                       
    '''                                                                                                                                                                  
    fj = open(argvs[1],'r')                                                                                                                                              
    json_data = json.load(fj)                                                                                                                                            
    fj.close()      
    
    output(json_data, mod)                                                                                                                                                                                                                                                                                                        
    '''

    # more than two files ver                                                                                                                                            
    branch = os.walk(argvs[1])
    mod = argvs[2]

    for dirpath, dirs, files in branch:
        for f in files:
            json_data = loadingJson(dirpath, f)
            output(json_data, mod)

Execute as follows.

json2text.py


$ python json2text.py [json] [speaker]

-[json]: Directory where the input json file is stored (When inputting one json file, enable the commented out part of L47 to 53, and disable L56 and below instead) -[speaker]: Enter "U" to create Utterance training data, and "S" to create Response training data.

By this processing, it was completed just before the training data (Utterance, Response). The rest is morphological analysis and word-separation.

$ mecab -Owakati Utterance.txt > Utterance_wakati.txt

Learning the dialogue model

Learning data (Utterance, Response) has been created by the processing up to this point. Next, we will learn the model.

learning.py


#!/usr/bin/env python                                                                                                                                                    
# -*- coding: utf-8 -*-                                                                                                                                                  

import sys
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L


class seq2seq(chainer.Chain):
    def __init__(self, jv, ev, k, jvocab, evocab):
        super(seq2seq, self).__init__(
            embedx = L.EmbedID(jv, k),
            embedy = L.EmbedID(ev, k),
            H = L.LSTM(k, k),
            W = L.Linear(k, ev),
            )

    def __call__(self, jline, eline, jvocab, evocab):
        for i in range(len(jline)):
            wid = jvocab[jline[i]]
            x_k = self.embedx(Variable(np.array([wid], dtype=np.int32)))
            h = self.H(x_k)
        x_k = self.embedx(Variable(np.array([jvocab['<eos>']], dtype=np.int32)))
        tx = Variable(np.array([evocab[eline[0]]], dtype=np.int32))
        h = self.H(x_k)
        accum_loss = F.softmax_cross_entropy(self.W(h), tx)
        for i in range(len(eline)):
            wid = evocab[eline[i]]
            x_k = self.embedy(Variable(np.array([wid], dtype=np.int32)))
            next_wid = evocab['<eos>'] if (i == len(eline) - 1) else evocab[eline[i+1]]
            tx = Variable(np.array([next_wid], dtype=np.int32))
            h = self.H(x_k)
            loss = F.softmax_cross_entropy(self.W(h), tx)
            accum_loss += loss

        return accum_loss

def main(epochs, urr_file, res_file, out_path):

    jvocab = {}
    jlines = open(utt_file).read().split('\n')
    for i in range(len(jlines)):
        lt = jlines[i].split()
        for w in lt:
            if w not in jvocab:
                jvocab[w] = len(jvocab)

    jvocab['<eos>'] = len(jvocab)
    jv = len(jvocab)

    evocab = {}
    elines = open(res_file).read().split('\n')
    for i in range(len(elines)):
        lt = elines[i].split()
        for w in lt:
            if w not in evocab:
		evocab[w] = len(evocab)
	    ev = len(evocab)

    	demb = 100
    	model = seq2seq(jv, ev, demb, jvocab, evocab)
    	optimizer = optimizers.Adam()
    	optimizer.setup(model)

    	for epoch in range(epochs):
       		for i in range(len(jlines)-1):
           	jln = jlines[i].split()
            	jlnr = jln[::-1]
            	eln = elines[i].split()
            	model.H.reset_state()
            	model.zerograds()
            	loss = model(jlnr, eln, jvocab, evocab)
            	loss.backward()
            	loss.unchain_backward()
            	optimizer.update()
            	print i, " finished"		

        	outfile = out_path + "/seq2seq-" + str(epoch) + ".model"
        	serializers.save_npz(outfile, model)



if __name__ == "__main__":

    argvs = sys.argv

    _usage = """--                                                                                                                                                       
Usage:                                                                                                                                                                   
    python learning.py [epoch] [utteranceDB] [responseDB] [save_link]                                                                                                    
Args:                                                                                                                                                                    
    [epoch]: The argument is the number of max epochs to train models.                                                                                                   
    [utteranceDB]: The argument is input file to train model that is to convert as pre-utterance.                                                                        
    [responseDB]: The argument is input file to train model that is to convert as response to utterance.                                                                 
    [save_link]: The argument is output directory to save trained models.                                                                                                
""".rstrip()

    if len(argvs) < 5:
        print _usage
        sys.exit(0)


    epochs = int(argvs[1])
    utt_file = argvs[2]
    res_file = argvs[3]
    out_path = argvs[4]

    main(epochs, utt_file, res_file, out_path)

The execution is as follows.

learning.py


$ python learning.py [epoch] [utternceDB] [responseDB] [savelink]

-[epoch]: Number of learnings -[utteranceDB]: Training data (Utterance) -[responseDB]: Training data (Response) --[savelink]: Directory for saving trained models

Let's Conversation! Finally the learning is over. Now let's have a conversation!

generating.py


#!/usr/bin/env python                                                                                                                                                    
# -*- coding: utf-8 -*-                                                                                                                                                  

import sys
import numpy as np
import mecab as mcb
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L



class seq2seq(chainer.Chain):
    def __init__(self, jv, ev, k, jvocab, evocab):
        super(seq2seq, self).__init__(
            embedx = L.EmbedID(jv, k),
            embedy = L.EmbedID(ev, k),
            H = L.LSTM(k, k),
            W = L.Linear(k, ev),
            )

    def __call__(self, jline, eline, jvocab, evocab):
        for i in range(len(jline)):
            wid = jvocab[jline[i]]
            x_k = self.embedx(Variable(np.array([wid], dtype=np.int32)))
            h = self.H(x_k)
        x_k = self.embedx(Variable(np.array([jvocab['<eos>']], dtype=np.int32)))
        tx = Variable(np.array([evocab[eline[0]]], dtype=np.int32))
        h = self.H(x_k)
        accum_loss = F.softmax_cross_entropy(self.W(h), tx)
        for i in range(1,len(eline)):
            wid = evocab[eline[i]]
            x_k = self.embedy(Variable(np.array([wid], dtype=np.int32)))
            next_wid = evocab['<eos>'] if (i == len(eline) - 1) else evocab[eline[i+1]]
            tx = Variable(np.array([next_wid], dtype=np.int32))
            h = self.H(x_k)
            loss = F.softmax_cross_entropy(self.W(h), tx)
            accum_loss = loss if accum_loss is None else accum_loss + loss
            
        return accum_loss
        
def mt(model, jline, id2wd, jvocab, evocab):
    for i in range(len(jline)):
        wid = jvocab[jline[i]]
        x_k = model.embedx(Variable(np.array([wid], dtype=np.int32), volatile='on'))
        h = model.H(x_k)
    x_k = model.embedx(Variable(np.array([jvocab['<eos>']], dtype=np.int32), volatile='on'))
    h = model.H(x_k)
    wid = np.argmax(F.softmax(model.W(h)).data[0])
    if wid in id2wd:
        print id2wd[wid],
    else:
        print wid,
    loop = 0
    while (wid != evocab['<eos>']) and (loop <= 30):
        x_k = model.embedy(Variable(np.array([wid], dtype=np.int32), volatile='on'))
        h = model.H(x_k)
        wid = np.argmax(F.softmax(model.W(h)).data[0])
        if wid in id2wd:
            print id2wd[wid],
        else:
            print wid,
        loop += 1
    print

def constructVocabs(corpus, mod):

    vocab = {}
    id2wd = {}
    lines = open(corpus).read().split('\n')
    for i in range(len(lines)):
        lt = lines[i].split()
        for w in lt:
            if w not in vocab:
                if mod == "U":
                    vocab[w] = len(vocab)
                elif mod == "R":
                    id2wd[len(vocab)] = w
                    vocab[w] = len(vocab)

    if mod == "U":
        vocab['<eos>'] = len(vocab)
        v = len(vocab)
        return vocab, v
    elif mod == "R":
        id2wd[len(vocab)] = '<eos>'
        vocab['<eos>'] = len(vocab)
        v = len(vocab)
        return vocab, v, id2wd
         
def main(mpath, utt_file, res_file):

    jvocab, jv = constructVocabs(utt_file, mod="U")
    evocab, ev, id2wd = constructVocabs(res_file, mod="R")

    demb = 100
    model = seq2seq(jv, ev, demb, jvocab, evocab)
    serializers.load_npz(mpath, model)

    while True:
        utterance = raw_input()
        if utterance == "exit":
            print "Bye!!"
            sys.exit(0)
                                                                                                                                   
        jln = mcb.construct_BoW(utterance)
        jlnr = jln[::-1]
        mt(model, jlnr, id2wd, jvocab, evocab)

                                        
if __name__ == "__main__":

    argvs = sys.argv

    _usage = """--                                                                                                                                                       
Usage:                                                                                                                                                                   
    python generating.py [model] [uttranceDB] [responseDB]                                                                                                               
Args:                                                                                                                                                                    
    [model]: The argument is seq2seq model to be trained using dialog corpus.                                                                                            
    [utteranceDB]: The argument is utterance corpus to gain the distributed representation of words.                                                                     
    [responseDB]: The argument is response corpus to gain the distributed representation of words.                                                                       
""".rstrip()

    if len(argvs) < 4:
        print _usage
        sys.exit(0)


    model = argvs[1]
    utt_file = argvs[2]
    res_file = argvs[3]

    main(model, utt_file, res_file)                                                                                                                 

generating.py


$ python generating.py [model]

-[model]: Trained model (seq2seq-**. Model) -[utteranceDB]: Training data (Utterance) -[responseDB]: Training data (Response)

Here is the result of the actual dialogue!

Epoch=1:

U: Hello S: I've been there recently, recently. U: What is your gender? S: Well, which one are you going to? U: Do you have a lover? S: Well, which one are you going to?

Epoch=10:

U: Hello S: Hello U: What is your gender? S: Is it fortune-telling! I'm not good at it. .. .. Winter clothes are heavy and expensive. U: Do you have a lover? S: Is there anything recent?

Epoch=30:

U: Hello S: Hello, thank you U: What is your gender? S: Breech birth may not know gender U: Do you have a lover? S: Maybe you want a lover

When Epoch is 1, it feels like I haven't learned at all, but as Epoch goes up, the answer is getting better and better! However, this time it was a closed test (input the utterance in the corpus), so it will be even worse in the actual dialogue. It seems that it is necessary to increase the number of training data by about 3 digits for improvement.

Summary

This time, we implemented a dialogue system that can generate utterances word by word using the seq2seq model. The problem is that it is necessary to prepare a large amount of learning data for implementation, but conversely, if there is data, a system that can have a dialogue like that can be created. The answer will vary greatly depending on the distribution of the data used.

If you find it interesting, please give it a try! Let's Conversation!

Recommended Posts

Implementation of dialogue system using Chainer [seq2seq]
Implementation of "blurred" neural network using Chainer
Implementation of TF-IDF using gensim
Implementation of Chainer series learning using variable length mini-batch
Rank learning using neural network (Implementation of RankNet by Chainer)
Simple neural network implementation using Chainer
Implementation of desktop notifications using Python
Construction of recommendation system using word-of-mouth doc2vec
Seq2Seq (1) with chainer
[Python] Implementation of clustering using a mixed Gaussian model
Implementation example of simple LISP processing system (Python version)
Implementation of object authenticity judgment condition using __bool__ method
Overview of DNC (Differentiable Neural Computers) + Implementation by Chainer
Bayesian optimization implementation of neural network hyperparameters (Chainer + GPyOpt)
Let's analyze the emotions of Tweet using Chainer (2nd)
Implementation of a convolutional neural network using only Numpy
Let's analyze the sentiment of Tweet using Chainer (1st)
Implementation of Fibonacci sequence
About variable of chainer
Example of using lambda
Precautions when using Chainer
Implementation of recommendation system ~ I tried to find the similarity from the outline of the movie using TF-IDF ~
Implementation of VGG16 using Keras created without using a trained model
Verification and implementation of video reconstruction method using GRU and Autoencoder