[PYTHON] A one-year history of operating a chatbot in a muffled manner

This article is the 7th day of Mynavi Advent Calendar 2019!

I usually specialize in natural language processing in the department that handles big data and AI. This time, I will spend a lot of time talking to Slack in-house about unleashing chatbots and playing.

Overview

A bot with high fanning performance is running on AWS. Sometimes I get an injection attack from my senior at work, but I'm fine. I regret that it would have been better to have a single cloud instead of a multi-cloud.

Below is an image of actually playing with a chatbot.

スクリーンショット 2019-12-04 11.59.38.png

Hmmm, what a face to my parents ...

Introduction

Background I wanted to create

All human beings live with the desire to make chatbots. At least I am, so I'm sure others are. Meanwhile, a paper on Microsoft's high school girl AI "Rinna" was published in the world (2016).

cute

I have to make this ... (2 years have passed)

About Slack

I use Slack as my main communication tool. There were 677 on the Public Channel alone. I'm not sure if it's more or less: thinking:

Since the times culture is rooted, the purpose of this chatbot is to operate it at my own times and "make everyone smile". It contains a prayer that we should become such a dialogue bot. I was there. By the way, the name is "Natsu-chan" because it was released in the summer. Every time the version goes up, I name the season at that time.

The Slack API used is the Events API (https://api.slack.com/events-api). I don't forget to narrow down the permissions so that the API is called only when mentioned.

Deliverables

Chatbot configuration diagram

I made it quickly.

The configuration diagram of the chatbot is shown below.

春ちゃん構成図_2 (2).png

Actually, I did my best.

At first, I gave a single EC2 an Elastic IP, saved the session with the Screen command, and linked it with Slack. However, there was a god's announcement that "there is no merit in using AWS," and I thought that was the case, and I had a lot of fun with it.

System configuration issues

I will list the problems of the current configuration in my own way.

About chatbot algorithms

This time, I implemented it with Seq2Seq + Attention + Sentencepiece. The specific technology group is shown below.

item Contents
Learning algorithm Seq2Seq (4-layer LSTM)+ Global Attention
Tokenizer Sentencepiece
Pre-learning Word2Vec
Optimization method Adam
Training data Dialogue bankruptcy corpus+Old name+Conversation log on Slack
Library used Chainer

We are using Word2Vec to learn word vectors from a learning corpus tokenized by Sentencepiece as pre-learning. The learned word vector was used as the initial value of Word Embedding of Encoder and Decoder.

Below is the __init__ part.

def __init__(self, vocab_size, embed_size, hidden_size, eos, w=None, ignore_label=-1):
    super(Seq2Seq, self).__init__()

    self.unk = ignore_label
    self.eos = eos

    with self.init_scope():

        # Embedding Layer
        self.x_embed = L.EmbedID(vocab_size, embed_size, initialW=w, ignore_label=ignore_label)
        self.y_embed = L.EmbedID(vocab_size, embed_size, initialW=w, ignore_label=ignore_label)
        # 4-Layer LSTM
        self.encoder = L.NStepLSTM(n_layers=4, in_size=embed_size, out_size=hidden_size, dropout=0.1)
        self.decoder = L.NStepLSTM(n_layers=4, in_size=embed_size, out_size=hidden_size, dropout=0.1)

        # Attention Layer
        self.attention = L.Linear(2*hidden_size, hidden_size)

        # Output Layer
        self.y = L.Linear(hidden_size, vocab_size)

Seq2Seq is a series conversion model that uses Encoder-Decoder. In the original paper, it was published in machine translation (English-French translation task), but converting an input string to an output string can also be used in dialogue! That is the recognition that is also used in dialogue bots.

The training data is a set of input sentence-output sentence as follows.

Input statement:It's really fun to talk to you. Would you like to go up to the living room and talk?
Output statement:I have something to do today, so I'll be free.

The answers obtained from the learned model are like a question-and-answer formula, and do not consider the flow of conversation at all. Sometimes the conversation seems to have continued, but it just happens to continue and is not considered as a model at all.

The following is the __call__ part.

def __call__(self, x, y):

    """

    :param x:Mini-batch input data
    :param y:Mini-batch output corresponding to input data
    :return:Error and accuracy
    """

    batch_size = len(x)
    eos = self.xp.array([self.eos], dtype='int32')

    #EOS signal embedding
    y_in = [F.concat((eos, tmp), axis=0) for tmp in y]
    y_out = [F.concat((tmp, eos), axis=0) for tmp in y]

    # Embedding Layer
    emb_x = sequence_embed(self.x_embed, x)
    emb_y = sequence_embed(self.y_embed, y_in)

    # Encoder,Input to Decoder
    h, c, a = self.encoder(None, None, emb_x)  # h => hidden, c => cell, a => output(Attention)
    _, _, dec_hs = self.decoder(h, c, emb_y)  # dec_hs=> output

    #batch size decoder output concat
    dec_h = chainer.functions.concat(dec_hs, axis=0)
    
    #Attention calculation
    attention = chainer.functions.concat(a, axis=0)
    o = self.global_attention_layer(dec_h, attention)

    t = chainer.functions.concat(y_out, axis=0)

    loss = F.softmax_cross_entropy(o, t)  #Error calculation
    accuracy = F.accuracy(o, t)  #Precision calculation

    return loss, accuracy

Beam search is used for inference. The beam width is 3. The maximum word length is 50.

For these implementations, I referred to @ nojima's blog below. It was a great help. Thank you! https://nojima.hatenablog.com/entry/2017/10/10/023147

Learning uses Google Colaboratory. https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja

Difficulties

API side implementation

It is implemented with Flask + uWSGI. When working with the Events API, Challenge occurs from the Slack side. https://api.slack.com/events-api#subscriptions

Specifically, the following POST is thrown at the endpoint. If you return the value of this " challenge " as it is, the challenge is successful. After that, it is possible to link with Slack according to the set Slack API settings. Note that if the Challenge fails, Slack will not POST to the API endpoint.


{
    "token": "Slack API token",
    "challenge": "3eZbrw1aBm2rZgRNFdxV2595E9CY3gmdALWMmHkvFXO7tYXAYM8P",
    "type": "url_verification"
}

Permission settings can be set in "Subscribe to bot events" in "Event Subscriptions". I added ʻapp_mention` as an Event. This is an Event that POSTs post data to the configured endpoint when mentioned on the channel that introduced the bot.

2019-12-06 23.05.48 api.slack.com 8cfe665a832f.png

Under "Subscribe to bot events" there is "Subscribe to workspace events", which is a setting that applies as an event for the entire workspace. Please note that it will be difficult.

API challenges

The issues on the API side are as follows.

Post-release reaction

I was glad that they were accepted favorably within times. For some reason, I interact with bots more than once a week (including myself).

Sometimes I hear complaints, sometimes I get injection attacks, and sometimes I get "strong" words. Thinking about the engineers behind the chatbots, I couldn't hit them hard. I'm sorry, Siri. I'm sorry, Cortana. I'm sorry, Kyle.

It was a year when I realized that there are many things that I can understand for the first time after making it.

At the end

スクリーンショット 2019-12-05 15.17.11.png

:innocent:

Link

Recommended Posts

A one-year history of operating a chatbot in a muffled manner
A rough summary of OS history
Sum of variables in a mathematical model
Draw a graph of a quadratic function in Python
Make a copy of the list in Python
Find the number of days in a month
Rewriting elements in a loop of lists (Python)
A proposal for versioning of features in Kedro
Make a joyplot-like plot of R in python
Get a glimpse of machine learning in Python
A well-prepared record of data analysis in Python
Derivation of certainty of effect in A / B testing