Microsoft Azure's Natural Language Processing Service LUIS (Language Understanding Intelligent Service) is very Convenient.

You can extract the ** Intents ** and ** Entities ** of the conversation at the same time, and you can easily create a simple conversation response program or chatbot. (Something is the word you want to extract)

https://www.luis.ai/

However, entering entities can be a daunting task. For example, if "current place of residence" and "desired place to move" are entered, we will consider creating a chatbot (a place name extraction API for) that will tell you the optimal moving plan.

Using the browser screen, as shown in the image above, it was necessary to enter the example sentences and "the entities you want to extract from here to here" one by one, which was quite a persevering task.

Strategy

LUIS has JSON export and import capabilities.

You can export with `{}` of each application on the Application Top Page on the above screen, or with Import App. You can upload it again.

for that reason,

Enter information on Excel as shown below
Export to csv
Convert to JSON with Python script

Python script

Prepare the following script to generate the JSON file. The version of Python I'm using is 3.5.

Detailed settings and file names are entered with fixed values, but this time it is a discarded code, and even if you add it, you will only need to change the code a little, so please forgive me.

This time, the "moving source (" ʻArea :: FromArea) "and the" moving destination (Area :: ToArea) "are the parent-child relationship (` `Hierarchical) entity. I devised to express it. Of course, it also supports entities that do not have a parent-child relationship.

I also used OrderedDict because Python dictionary types don't keep order. You probably don't have to keep the order, but it's too tedious to see the output ...

#!/usr/bin/env python3

import json
import csv
from collections import defaultdict, OrderedDict


def main():
    df = _read_csv('./input.csv')
    output = _create_output(df)
    print(json.dumps(output))


class DataFrame(list):
    def __getitem__(self, key: str) -> list:
        return [x[key] for x in self]

    def keys(self) -> set:
        res = set()
        for x in self:
            for y in x.keys():
                res.add(y)
        return res


def _create_output(df: DataFrame) -> OrderedDict:
    entity_keys = df.keys() - {'text', 'Entities', 'intent'}
    entities = _create_entities(entity_keys)
    intents = _create_intents(df['intent'])
    utterrances = _create_utterances(df, entity_keys)
    return _create_luis_schema(intents, entities, utterrances)


def _read_csv(path: str) -> DataFrame:
    with open(path, 'r', encoding='utf8') as f:
        df = DataFrame(l for l in csv.DictReader(f) if l)
    return df


def _create_entities(entities: list) -> list:
    res = []
    for name, children in _parse_entities(entities).items():
        cs = [c for c in children if c is not None]
        res.append(_create_entity(name, cs))
    return res


def _create_intents(intents: list) -> list:
    res = set(intents)
    res.add('None')
    return [{'name': n} for n in res]


def _parse_entities(entities: list) -> defaultdict(set):
    res = defaultdict(set)
    for entity in entities:
        name, child = _parse_entity(entity)
        res[name].add(child)
    return res


def _parse_entity(entity: str) -> tuple:
    if '::' not in entity:
        return (entity, None)
    return tuple(entity.split('::'))


def _create_entity(name: str, children: set) -> OrderedDict:
    res = OrderedDict([('name', name)])
    if len(children) >= 1:
        res['children'] = list(children)
    return res


def _create_utterances(rows: DataFrame, entity_keys) -> list:
    return [_create_utterrance(x, entity_keys) for x in rows]


def _create_utterrance(row: dict, entity_keys: set) -> OrderedDict:
    return OrderedDict([
      ('text', row['text']),
      ('intent', row['intent']),
      ('entities', _create_utterrance_entities(
          row['text'], [(k, row[k]) for k in entity_keys]))
    ])


def _create_utterrance_entities(text: str, entitity_items: list) -> list:
    return [_create_utterrance_entity(text, k, v)
            for k, v in entitity_items if v]  # not ''


def _create_utterrance_entity(
        text: str, entity_key: str, entity_value: str) -> OrderedDict:
    start_pos = text.find(entity_value)
    return OrderedDict([
        ('entity', entity_key),
        ('startPos', start_pos),
        ('endPos', start_pos + len(entity_value) - 1)
    ])


def _create_luis_schema(
        intents: list, entities: list, utterrances: list) -> OrderedDict:
    return OrderedDict([
        ('luis_schema_version', '2.1.0'),
        ('versionId', '0.1'),
        ('name', 'TestOperator'),
        ('desc', 'forTestOperator'),
        ('culture', 'ja-jp'),
        ('intents', intents),
        ('entities', entities),
        ('composites', []),
        ('closedLists', []),
        ('bing_entities', []),
        ('actions', []),
        ('model_features', []),
        ('regex_features', []),
        ('utterances', utterrances)
    ])


if __name__ == '__main__':
    main()

However, if there are two character strings you want to extract, you can only specify the first one. This should be taken care of when typing.

If you execute this script, output it to an appropriate json file, and import it from the browser screen earlier, apps with the text you want to learn should be prepared.

python3 export_schema.py > output.json

Future outlook

Azure CLI is also available, so maybe you can automate import / export work etc.

If I can put it up to operation, I will try it to save labor.

Easily enter Azure LUIS learning texts with Python scripts

Strategy

Python script

Future outlook