[PYTHON] How to make a dialogue system dedicated to beginners

Introduction

Hello. This is Hironsan. An explosive number of bots have appeared since Line, Facebook, Microsoft and others announced platforms for bot development from March to April. In fact, even if it is limited to Facebook Messenger alone more than 11,000 bots are in operation as of July 2016doing.

However, most of these bots are simple question-and-answer systems, not interactive systems. With this, it is not possible to gradually listen to the user's tastes.

Therefore, this time, I would like to create a ** restaurant search dialogue system ** that takes into account the history of dialogue, and finally incorporate it into the bot. As a completed image, you can create something like the following animation. bot2.mov.gif

Specifically, create it in the following steps.

In addition to searching for restaurants, you can also chat. Through the creation of these bots, the purpose is to understand how to create a basic interaction system.

Target audience

Preparation

Building a Python environment

The dialogue system created this time will be built using the Python3 system. Therefore, please download Python3 series from the official website and then install it.

Download Python

Hands-on repository Clone

We will prepare a hands-on repository from now on. Please prepare according to the following procedure.

  1. https://github.com/Hironsan/HotPepperGourmetDialogue
  2. Fork (= copy) this repository from the "fork" button on the upper right.
  3. Use git clone to fetch the forked repository to your terminal and switch branches. Now you are ready to go.
$ git clone YOUR_FORK_REPOSITORY.git
$ cd HotPepperGourmetDialogue
$ git checkout -b tutorial origin/tutorial

In addition, set PYTHONPATH directly under the HotPepperGourmetDialogue directory for module import. Do not add double quotation marks "".

$ export PYTHONPATH=`pwd`

In the command prompt environment, execute the following.

$ set PYTHONPATH=%CD%

Building a virtual environment

For Virtualenv

For Virtualenv, execute the following command to build and enable the virtual environment.

$ pip install virtualenv
$ virtualenv env
$ source env/bin/activate

For Conda

For Conda, run the following command to build and enable the virtual environment.

$ conda create -n env python
$ source activate env   #For command prompt activate env

Get API Key

Obtain the following two API Keys for use in the restaurant search interaction system.

The docomo chat conversation API is used to chat with bots, and the HotPepper Gourmet Search API is used to perform restaurant searches. As a guide, it takes about 5 minutes for HotPepper's API Key and about 1 day for docomo chat dialogue API.

Until you get it, proceed to the creation of the Echolalia Bot.

Create a Slack account

This time we will create a bot on Slack. So if you don't have a Slack account, create one below. https://slack.com/

Please set the team name etc. to Yoshina.

Create a Slack Bot account

First, let's create a Slack Bot account.

https://my.slack.com/services/new/bot

It's assumed that your team's Slack is already open and you're logged in as an authorized user. In that state, if you open the above link, the Bot account creation screen will open. Enter your bot account username in the form displayed and press "Add bot integration". スクリーンショット 2016-08-21 22.10.05.png

Press "Add bot integration" and a new form will be displayed. Make a note of the "API Token" in it as it will be used in the bot that will be created later. スクリーンショット 2016-08-21 22.25.23.png

You can change the name and icon image of the bot created with "Customize Icon" or "Customize Name". Save your changes with Save Integration. スクリーンショット 2016-08-21 22.26.46.png

Let's create a private channel and register the created bot for the test after this. After setting as below, save the contents with "Create Channel". スクリーンショット 2016-08-21 22.30.18.png

This completes the Slack settings. Now let's start creating a bot.

First bot

Now it's time to create a bot. Use the following Python library. lins05/slackbot

Installation is done with pip.

$ pip install slackbot

Try launching Slack Bot

First, change to the directory for the bot.

$ cd application

After moving to the application directory, describe the bot settings in “slackbot_settings.py” in it.

# -*- coding: utf-8 -*-
import os


API_TOKEN = os.environ.get('SLACK_API_KEY', '')
 
default_reply = "Suimasen. That word Wakarimasen"

Here, API_TOKEN reads the value noted earlier from the environment variable and uses it. Therefore, execute the following command and set the value you wrote down earlier in the environment variable.

$ export SLACK_API_KEY=YOUR_API_KEY  #For command prompt: set SLACK_API_KEY=YOUR_API_KEY

Next, write the code to start the bot in “slack_bot.py”.

# -*- coding: utf-8 -*-
from slackbot.bot import Bot


def main():
    bot = Bot()
    bot.run()
 
if __name__ == "__main__":
    main()

Now it's time to start the bot.

$ python slack_bot.py

Respond to specific words in a mention or to words posted on a channel. スクリーンショット 2016-08-21 23.34.15.png

The response given here is due to the default plugin in the Slackbot library. This alone is not good, so I will register the plug-in myself and expand the function. So, let's create a plugin by ourselves next.

First plugin

In Next, let's create a Bot to return to the "Hello" to the "Hello" of the user. The Python Slackbot library used in this article can be extended with plugins. Therefore, by implementing the plugin yourself, you can implement the ability to respond to specific words in direct messages and posts within the channel that the bot is participating in.

First, move to the plugin placement directory in the created Bot directory.

$ cd plugins

This plugins directory contains "\ _ \ _ init \ _ \ _. Py". The reason why this file is necessary is that the directory to be loaded as a Slackbot plugin must be a Python package, so I put "\ _ \ _ init \ _ \ _. Py" to recognize it as a package. By the way, the contents can be empty.

Now, after moving to the plugin deployment directory, let's actually create a plugin script. Write the following code in "slack.py".

from slackbot.bot import respond_to
 
@respond_to('Hello')
@respond_to('today')
def hello(message):
    message.reply('Hello!')

A decorator called "respond_to" is added to the function "hello".

By specifying a keyword that matches the argument of the “respond_to” decorator, the function will be registered to respond to mentions to the bot when the plugin is loaded. Keywords can also be specified using regular expressions. You can also add multiple decorators as in this sample to support multiple keywords.

Finally, add the following to “slackbot_settings.py” so that the plugin will be loaded.

PLUGINS = [
    'plugins',
]

Launch Slackbot and send a mention. Please add @. </ font>

$ python slack_bot.py
スクリーンショット 2016-08-21 23.51.18.png

You can see that the bot created this time is reacting as intended.

Echolalia Bot

Next, let's use regular expressions to get what the user says. Try modifying "slack.py" as follows.

# -*- coding: utf-8 -*-
from slackbot.bot import respond_to, listen_to


@listen_to('I(.*)is')
@listen_to('I(.*)is')
def hello(message, something):
    message.reply('Hello!{0}Mr.'.format(something))

Start Slackbot and let's post. Please do not add @. </ font>

$ python slack_bot.py
スクリーンショット 2016-08-22 0.09.12.png

You can see that the user's remarks have been acquired.

There are two major differences between the previous hello function and the current hello function. First of all, this time we are using the "listen_to" decorator. By specifying a word that matches the argument of the “listen_to” decorator, the bot will respond to posts on the channels it is participating in.

The other is "(. \ *)" In the decorator. This uses a regular expression, and you can match it with any character string by specifying "(. \ *)". Also, the matched content is stored in something of the second argument. Therefore, I was able to send back what I said by using something.

A bot that simply echoes the user's remarks is even easier. Just write the code like this:

@respond_to('(.*)')
def refrection(message, something):
    message.reply(something)

In addition to matching arbitrary strings, regular expressions can match only numbers or only uppercase letters. See below for details.

Regular expressions in Python

Also, when writing a regular expression, if you use an online editor to check it in real time, you can immediately see what pattern the regular expression you are writing matches, so you can work faster.

https://regex101.com/

regexp.mov.gif

Restaurant search system

So far, you know how to make a Slackbot using the slackbot library. From here, we will build an interactive system that searches restaurants using Python. Then, the built dialogue system is incorporated into Slack to create Slackbot.

The completed image is as follows. bot2.mov.gif This bot can search for restaurants through dialogue. In addition, it is not tasteful by itself, so it is possible to chat. Now, let's create the completed drawing when we know it. First of all, preparation.

Preparation

Install the library used in the restaurant search bot. Execute the following command to install the library.

$ pip install requests
$ pip install pyyaml
$ pip install doco
  • If you get a Unicode Decode Error when installing doco on Windows
  1. Download the repository from https://github.com/heavenshell/py-doco
  2. Edit the 18th line of setup.py as follows and install
open(rst, 'r', encoding='utf-8')
$ python setup.py install

Also, set the API key obtained by docomo chat dialogue API and HotPepper gourmet search API in the environment variable.

$ export DOCOMO_DIALOGUE_API_KEY=YOUR_DOCOMO_API_KEY
$ export HOTPEPPER_API_KEY=YOUR_HOTPEPPER_API_KEY

For the command prompt environment, set as follows.

$ set DOCOMO_DIALOGUE_API_KEY=YOUR_DOCOMO_API_KEY
$ set HOTPEPPER_API_KEY=YOUR_HOTPEPPER_API_KEY

System configuration

The system configuration is as follows. It follows the configuration of the basic dialogue system. スクリーンショット 2016-08-18 20.50.40.png

Directory structure

The directory structure is as follows. The directory structure corresponds to the system configuration.

.
├── application                       #Creating a Slackbot
└── dialogue_system                   #Overall dialogue system
    ├── language_generation             #Language generator
    ├── language_understanding          #Language comprehension department
    │   ├── attribute_extraction          #Attribute extraction
    │   ├── dialogue_act_type             #Estimating the type of dialogue
    │   ├── language_understanding.py
    │   └── utils
    ├── dialogue_management             #Dialogue management department
    ├── backend                         #External cooperation department
    │   └── apis
    ├── knowledge                       #Domain knowledge
    └── bot.py

Now let's create the system components one by one.

Language comprehension

Well, let's create it from the language understanding department first. We will create the following two processes.

  • Attribute extraction
  • Estimating the type of dialogue

There are two main ways to make these:

  • Rule-based method
  • Machine learning method

It is possible to create a robust system by using the machine learning method, but it is quite difficult to prepare the data. Therefore, let's first make it with a rule-based method that does not require learning data and can be made relatively easily.

The directory structure of the Language Understanding Department is as follows.

.
└── language_understanding          #Language comprehension department
    ├── attribute_extraction          #Attribute extraction
    ├── dialogue_act_type             #Estimating the type of dialogue
    ├── language_understanding.py   #Integration of attribute extraction and interaction type estimation
    └── utils                       #Convenient function

Attribute extraction

When the language understanding department receives the text entered by the user, it extracts the attributes. The attributes to be extracted this time are the following three. Use these three pieces of information when searching for restaurants.

  • Cooking genre
  • place
  • Budget limit

There is a method of keyword extraction as a method for attribute extraction based on rules. This is a method of preparing a dictionary of places and cooking genres in advance and extracting the matching part during the user's utterance. As a specific example, if the user utters "Shinjuku is a good place" and the keyword "Shinjuku" is in the place dictionary, "Shinjuku" can be extracted as the place from the user's utterance.

The directory structure of the attribute extraction part is as follows.

.
└── attribute_extraction          #Attribute extraction
    ├── __init__.py
    └── rule_based_extractor.py

rule_based_extractor.py looks like this: attribute_extraction/rule_based_extractor.py

You can extract attributes as follows.

from rule_based_extractor import RuleBasedAttributeExtractor

extractor = RuleBasedAttributeExtractor()
attribute = extractor.extract(text='I want to eat ramen')
print(attribute)
>>> {'LOCATION': '', 'GENRE': 'ramen', 'MAXIMUM_AMOUNT': ''}

Estimating the type of dialogue

When the language comprehension department receives the text entered by the user, it estimates the type of interaction. The following four types of dialogue are estimated this time.

  • Genre designation (INFORM_GENRE)
  • Specify location (INFORM_LOC)
  • Maximum amount specified (INFORM_MONEY)
  • Other (OTHER)

We will use the result of attribute extraction to estimate the interaction type. In this method, if the cooking genre is extracted as an attribute, the genre is estimated as the dialogue type, and if the location is extracted, the location is estimated as the dialogue type. If nothing is extracted as an attribute, we will estimate Other (OTHER) as the interaction type.

The directory structure of the interaction type estimation part is as follows.

.
└── dialogue_act_type             #Estimating the type of dialogue
    ├── __init__.py
    └── rule_based_estimator.py

rule_based_estimator.py is a simple code like this: dialogue_act_type/rule_based_estimator.py

After checking the operation, it will be as follows.

from rule_based_extractor import RuleBasedAttributeExtractor
from rule_based_estimator import RuleBasedDialogueActTypeEstimator

extractor = RuleBasedAttributeExtractor()
estimator = RuleBasedDialogueActTypeEstimator()
attribute = extractor.extract(text='I want to eat ramen')
act_type = estimator.estimate(attribute)
print(act_type)
>>> 'INFORM_GENRE'

Integration of attribute extraction and interaction type estimation

So far, we have extracted the attributes and estimated the type of interaction. Next, integrate these and create the language understanding code in "language_understanding.py".

The code is below. language_understanding/language_understanding.py

Dialogue management

The dialogue management department performs the following two processes based on the result of input understanding (dialogue act).

  • Internal state update
  • Action selection

The directory structure of the dialogue management department is as follows.

.
└── dialogue_management             #Dialogue management department
    ├── __init__.py
    ├── manager.py
    └── state.py

Let's look at each.

Internal state update

In the internal state update, the internal state of the interactive system is updated based on the result of language understanding. Update the internal state using rules. Also, the internal state can have various information, but this time, for the sake of simplicity, only the user's intention is given. A user's intent is an attribute and attribute value obtained in the past. Specifically, it has the following information.

  • Cooking genre (GENRE)
  • Location
  • Budget limit (MAXIMUM_AMOUNT)

First, write a class related to the state to be retained. I will write a state update method in this class. dialogue_management/state.py

Next, we will write a class that manages dialogue. The conversation management class delegates state update processing to the state class. dialogue_management/manager.py

Action selection

The action selection section decides the next action based on the internal state and rules. Specifically, it outputs the interaction type and passes it to the next language generation. Based on this interaction type, the language generator will generate the text. Call the external link if necessary.

We will write the action selection algorithm in the dialogue management class written in the internal state update. Action selection is made according to the following policy.

conditions Contents
IF(User interaction type=OTHER) Dialogue type to chat(CHAT)Output
IF(Some attribute values are not filled) Output dialogue type to find out unfilled attributes
IF(All attribute values are filled) Dialogue type presenting a restaurant(INFORM_RESTAURANT)Output

The actual code is below. dialogue_management/manager.py

Language generation

The language generation department generates the language based on the rules and the dialogue actions received from the dialogue management department.

The directory structure of the language generator is as follows.

.
└── language_generation             #Language generator
    ├── __init__.py
    └── generator.py

Language generation is performed according to the following policy.

conditions Contents
IF(Dialogue type=REQUEST_LOCATION) Listen to the location
IF(Dialogue type=REQUEST_GENRE) Listen to the genre
IF(Dialogue type=REQUEST_BUDGET) Ask your budget
IF(Dialogue type=CHAT) Chat
IF(Dialogue type=INFORM_RESTAURANT) Suggest a restaurant

Here, docomo's chat dialogue API is called when chatting, and HotPepper gourmet search API is called when proposing a restaurant.

The actual code is as follows. language_generation/generator.py

Creating a bot class

This completes the component creation of the interactive system. I will create a Bot class that combines these. Within this bot class, the components of the interaction system work together to generate a response to the user. dialogue_system/bot.py

Embedded in Slackbot

We will incorporate the created Bot class into Slackbot. Create a new plugin and write the following code. application/plugins/slack.py

After embedding, run slackbot.

$ python slack_bot.py

When executed, the bot will appear and you will be able to interact. bot2.mov.gif

To be smarter

This time, the language comprehension department was created by keyword extraction and rules, but keyword extraction cannot extract words that are not in the dictionary. To address this issue, you can use machine learning to extract attributes. Also, estimating the type of dialogue can be thought of as a sentence classification problem. This can also be estimated using machine learning. Recently, machine learning APIs have been enhanced, so please use them to improve your language comprehension department.

This is Miso, but in the following article, we are doing language understanding using machine learning.

Also, dialogue management using rules is simple and easy to understand, but as the number of rules increases, it becomes impossible to manage and change. Another option is to use reinforcement learning to address the problem. In addition, in the case of voice dialogue systems, the input may contain errors. Research has also been conducted to deal with input errors by using a reinforcement learning framework called POMDP. If you are interested, please try to manage the dialogue using reinforcement learning.

in conclusion

This time, I built a restaurant search dialogue system on Slack. This is the end of hands-on. How was that. It was a simple rule-based task-oriented dialogue system, but I hope you can feel various possibilities by making it SlackBot. Let's think about a mechanism to make it smarter by combining it with other Web APIs and machine learning APIs!

reference

  • Making Slackbot with Python (1)
  • Making Slackbot with Python (2)
  • Dialogue system created using machine learning
  • [Dialogue system (natural language processing series)](https://www.amazon.co.jp/%E5%AF%BE%E8%A9%B1%E3%82%B7%E3%82%B9%E3 % 83% 86% E3% 83% A0-% E8% 87% AA% E7% 84% B6% E8% A8% 80% E8% AA% 9E% E5% 87% A6% E7% 90% 86% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA-% E4% B8% AD% E9% 87% 8E-% E5% B9% B9% E7% 94% 9F / dp / 433902757X / ref = sr_1_1? ie = UTF8 & qid = 1471848601 & sr = 8-1 & keywords =% E5% AF% BE% E8% A9% B1% E3% 82% B7% E3% 82% B9% E3% 83% 86% E3% 83 % A0)

Recommended Posts