Easily enter Azure LUIS learning texts with Python scripts

Microsoft Azure's Natural Language Processing Service LUIS (Language Understanding Intelligent Service) is very Convenient.

You can extract the ** Intents ** and ** Entities ** of the conversation at the same time, and you can easily create a simple conversation response program or chatbot. (Something is the word you want to extract)

https://www.luis.ai/

However, entering entities can be a daunting task. For example, if "current place of residence" and "desired place to move" are entered, we will consider creating a chatbot (a place name extraction API for) that will tell you the optimal moving plan.

image.png

Using the browser screen, as shown in the image above, it was necessary to enter the example sentences and "the entities you want to extract from here to here" one by one, which was quite a persevering task.

Strategy

LUIS has JSON export and import capabilities.

image.png

You can export with `{}` of each application on the Application Top Page on the above screen, or with Import App. You can upload it again.

for that reason,

image.png

Python script

Prepare the following script to generate the JSON file. The version of Python I'm using is 3.5.

Detailed settings and file names are entered with fixed values, but this time it is a discarded code, and even if you add it, you will only need to change the code a little, so please forgive me.

This time, the "moving source (" ʻArea :: FromArea) "and the" moving destination (Area :: ToArea) "are the parent-child relationship (` `Hierarchical) entity. I devised to express it. Of course, it also supports entities that do not have a parent-child relationship.

I also used OrderedDict because Python dictionary types don't keep order. You probably don't have to keep the order, but it's too tedious to see the output ...

#!/usr/bin/env python3

import json
import csv
from collections import defaultdict, OrderedDict


def main():
    df = _read_csv('./input.csv')
    output = _create_output(df)
    print(json.dumps(output))


class DataFrame(list):
    def __getitem__(self, key: str) -> list:
        return [x[key] for x in self]

    def keys(self) -> set:
        res = set()
        for x in self:
            for y in x.keys():
                res.add(y)
        return res


def _create_output(df: DataFrame) -> OrderedDict:
    entity_keys = df.keys() - {'text', 'Entities', 'intent'}
    entities = _create_entities(entity_keys)
    intents = _create_intents(df['intent'])
    utterrances = _create_utterances(df, entity_keys)
    return _create_luis_schema(intents, entities, utterrances)


def _read_csv(path: str) -> DataFrame:
    with open(path, 'r', encoding='utf8') as f:
        df = DataFrame(l for l in csv.DictReader(f) if l)
    return df


def _create_entities(entities: list) -> list:
    res = []
    for name, children in _parse_entities(entities).items():
        cs = [c for c in children if c is not None]
        res.append(_create_entity(name, cs))
    return res


def _create_intents(intents: list) -> list:
    res = set(intents)
    res.add('None')
    return [{'name': n} for n in res]


def _parse_entities(entities: list) -> defaultdict(set):
    res = defaultdict(set)
    for entity in entities:
        name, child = _parse_entity(entity)
        res[name].add(child)
    return res


def _parse_entity(entity: str) -> tuple:
    if '::' not in entity:
        return (entity, None)
    return tuple(entity.split('::'))


def _create_entity(name: str, children: set) -> OrderedDict:
    res = OrderedDict([('name', name)])
    if len(children) >= 1:
        res['children'] = list(children)
    return res


def _create_utterances(rows: DataFrame, entity_keys) -> list:
    return [_create_utterrance(x, entity_keys) for x in rows]


def _create_utterrance(row: dict, entity_keys: set) -> OrderedDict:
    return OrderedDict([
      ('text', row['text']),
      ('intent', row['intent']),
      ('entities', _create_utterrance_entities(
          row['text'], [(k, row[k]) for k in entity_keys]))
    ])


def _create_utterrance_entities(text: str, entitity_items: list) -> list:
    return [_create_utterrance_entity(text, k, v)
            for k, v in entitity_items if v]  # not ''


def _create_utterrance_entity(
        text: str, entity_key: str, entity_value: str) -> OrderedDict:
    start_pos = text.find(entity_value)
    return OrderedDict([
        ('entity', entity_key),
        ('startPos', start_pos),
        ('endPos', start_pos + len(entity_value) - 1)
    ])


def _create_luis_schema(
        intents: list, entities: list, utterrances: list) -> OrderedDict:
    return OrderedDict([
        ('luis_schema_version', '2.1.0'),
        ('versionId', '0.1'),
        ('name', 'TestOperator'),
        ('desc', 'forTestOperator'),
        ('culture', 'ja-jp'),
        ('intents', intents),
        ('entities', entities),
        ('composites', []),
        ('closedLists', []),
        ('bing_entities', []),
        ('actions', []),
        ('model_features', []),
        ('regex_features', []),
        ('utterances', utterrances)
    ])


if __name__ == '__main__':
    main()

However, if there are two character strings you want to extract, you can only specify the first one. This should be taken care of when typing.

If you execute this script, output it to an appropriate json file, and import it from the browser screen earlier, apps with the text you want to learn should be prepared.

python3 export_schema.py > output.json

Future outlook

Azure CLI is also available, so maybe you can automate import / export work etc.

If I can put it up to operation, I will try it to save labor.

Recommended Posts

Easily enter Azure LUIS learning texts with Python scripts
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Learning Python with ChemTHEATER 05-1
Learning Python with ChemTHEATER 01
Easily beep with python
Easily serverless with Python with chalice
Reinforcement learning starting with Python
Machine learning with Python! Preparation
Beginning with Python machine learning
Python Iteration Learning with Cheminformatics
Easily handle lists with python + sqlite3
Machine learning with python (1) Overall classification
Input / output with Python (Python learning memo ⑤)
Perceptron learning experiment learned with Python
Utilize Python custom scripts with StackStorm
Easily handle databases with Python (SQLite3)
[Python] Collect images easily with icrawler!
Easily post to twitter with Python 3
"Scraping & machine learning with Python" Learning memo
Efficiently develop Azure Python apps with CI/CD
[Examples of improving Python] Learning Python with Codecademy
Amplify images for machine learning with python
Using Python and MeCab with Azure Databricks
Machine learning with python (2) Simple regression analysis
Working with Azure CosmosDB from Python Part.2
[Shakyo] Encounter with Python for machine learning
[Azure] Hit Custom Vision Service with Python
Data analysis starting with python (data preprocessing-machine learning)
How to enter Japanese with Python curses
Easily download mp3 / mp4 with python and youtube-dl!
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
Use Python and MeCab with Azure Functions
Build AI / machine learning environment with Python
python learning
Make Python scripts into Windows-executable .exes with Pyinstaller
Machine learning starting with Python Personal memorandum Part2
Automatically check Python scripts with GitHub + Travis-CI + pycodestyle
Machine learning starting with Python Personal memorandum Part1
Upgrade the Azure Machine Learning SDK for Python
[Python] Collect images with Icrawler for machine learning [1000 images]
Looking back on learning with Azure Machine Learning Studio
Getting started with AWS IoT easily in Python
I started machine learning with Python Data preprocessing
Use Python and word2vec (learned) with Azure Databricks
Create a Python console application easily with Click
Use Python / Django with Windows Azure Cloud Service!
[Azure Functions / Python] Chain functions with Queue Storage binding
Build a Python machine learning environment with a container