[PYTHON] I don't have time, so can I just ask a question? feat. COTOHA API

Introduction

When you're busy and you don't have time, it's hard to read a long chat. It's annoying because it can be overlooked if there are any questions in it. For example, what if you get the following chat when you are extremely busy?

Thank you for your hard work. I'm sorry I had you work overtime until late yesterday.
I forgot that I had to go home on my daughter's birthday.
It was helpful to have the materials to show to the customers.
Where do you put that?
Oh, of course, tonight's drink is my grace, thank you very much.

I would close the chat at the end of __daughter's birthday. __ But take a closer look. After that, you are asked where to find the materials created for the customer. The author who closed the chat at a detonation velocity did not answer the question, so he gets angry after this.

Then, extract only the question text and cut off all other parts. e? Can you do that? I can do it. Yes, with COTOHA API.

Sample program (sorry for miscellaneous)

getQuestion.py


import os
import regex
from dotenv import load_dotenv
import itertools
from cotoha_api import CotohaApi

dotenv_path = '.env'
load_dotenv(dotenv_path)
CLIENT_ID = os.environ.get("CLIENT_ID")
CLIENT_SECRET = os.environ.get("CLIENT_SECRET")
DEVELOPER_API_BASE_URL = os.environ.get("DEVELOPER_API_BASE_URL")
ACCESS_TOKEN_PUBLISH_URL = os.environ.get("ACCESS_TOKEN_PUBLISH_URL")

#COTOHA API instantiation
cotoha_api = CotohaApi(CLIENT_ID, CLIENT_SECRET, DEVELOPER_API_BASE_URL, ACCESS_TOKEN_PUBLISH_URL)

##Analysis target sentence
input = "Thank you for your hard work. I'm sorry I had you work overtime until late yesterday. I forgot that I had to go home on my daughter's birthday. It was helpful to have the materials to be shown to the customers made instead. Where do you put that? I'll collect it. Oh, of course, tonight's drink is my grace, thank you very much."
#Resolution analysis
result = cotoha_api.coreference(input)
coreference = result['result']['coreference']
tokens = result['result']['tokens']

if len(coreference) != 0:
    for relation in coreference:
        referents = relation['referents']
        sentence_id = referents[1]['sentence_id']
        token_id = referents[1]['token_id_to']
        token = referents[0]['form']
        tokens[sentence_id][token_id] = token            

input = ''.join(list(itertools.chain.from_iterable(tokens)))

sentenceList = regex.split(r'(?<=[。? ! \n])(?!$)', input, flags=regex.VERSION1)

#Sentence type judgment
for sentence in sentenceList:
    sentenceType = cotoha_api.sentenceType(sentence)
    modality = sentenceType['result']['modality']
    if modality == 'interrogative':
        print(sentence)

$ python3 getQuestion.py 
  #Output: Where do you put the materials to show to customers?

You have correctly extracted the question part. In addition, "that" can be replaced with "materials to be shown to customers."

What is COTOHA API?

This is an API platform for natural language processing and voice processing provided by NTT Communications. There is a free version (for Developers) and a paid version (for Enterprise), and the free version can use up to 1000 calls a day. It is highly recommended because you can use it as soon as you register even simple information. The following 14 APIs are provided by COTOHA API:

I don't think there are many services that handle so many types of APIs. Wow, NTT.

This article uses the COTOHA API __sentence type determination __ and __resolution analysis __.

Sentence type judgment

__Sentence type judgment __, which is one of the APIs provided by COTOHA API, outputs the type of speech act such as greeting and consent and the sentence type such as descriptive sentence, command sentence, and interrogative sentence. For details, see Sentence Type Judgment API Reference. In this program, it is used to extract the input sentences whose sentence type is interrogative.

Resolution analysis

__Resolution __ is a technique for estimating what the demonstratives such as "that", "it", "he", and "she" appearing in a sentence point to. The COTOHA API also provides a anaphora analysis API, so this time I used it to describe the question text in more detail. For more information on the API, see Resolution API Reference. There is also a Analysis Demo for the analytic API, so you can easily try it out.

image.png

A brief description of the program

Cotoha API is borrowed almost as it is from `` `cotoha_api_python3.py``` of this article (some endpoints are rewritten to the latest one) ).

First, in the following part, read the settings written in .env and generate a CotohaApi instance.

dotenv_path = '.env'
load_dotenv(dotenv_path)
CLIENT_ID = os.environ.get("CLIENT_ID")
CLIENT_SECRET = os.environ.get("CLIENT_SECRET")
DEVELOPER_API_BASE_URL = os.environ.get("DEVELOPER_API_BASE_URL")
ACCESS_TOKEN_PUBLISH_URL = os.environ.get("ACCESS_TOKEN_PUBLISH_URL")

#CotohaApi instantiation
cotoha_api = CotohaApi(CLIENT_ID, CLIENT_SECRET, DEVELOPER_API_BASE_URL, ACCESS_TOKEN_PUBLISH_URL)

Next, it performs anaphora resolution on the entered sentence and replaces the demonstrative part with the word it points to.

#Resolution analysis
result = cotoha_api.coreference(input)
coreference = result['result']['coreference']
tokens = result['result']['tokens']

if len(coreference) != 0:
    for relation in coreference:
        referents = relation['referents']
        sentence_id = referents[1]['sentence_id']
        token_id = referents[1]['token_id_to']
        token = referents[0]['form']
        tokens[sentence_id][token_id] = token            

input = ''.join(list(itertools.chain.from_iterable(tokens)))

sentenceList = regex.split(r'(?<=[。? ! \n])(?!$)', input, flags=regex.VERSION1)

Then, the sentence type is judged for each sentence to be written at the end, and the question sentence is extracted.

#Sentence type judgment
for sentence in sentenceList:
    sentenceType = cotoha_api.sentenceType(sentence)
    modality = sentenceType['result']['modality']
    if modality == 'interrogative':
        print(sentence)

Yes, Edten.

I tried various things

In addition to the above, I tried to extract interrogative sentences in some examples, so I will write the results in a straightforward manner.

Material = cake

input


Thank you for your hard work.
I'm sorry I had you work overtime until late yesterday.
I forgot that I had to go home on my daughter's birthday.
I bought a cake on my way home, and I was happy to eat it.
Oh yeah.
It was helpful to have the materials to show to the customers.
Where do you put that?
I'll collect it.
Oh, of course, tonight's drink is my grace, thank you very much.

output


Where do you put the cake?

The demonstrative has changed from "materials to show to customers" to "cake". why? ?? ??

What?

input


I have something to ask.
I'd like to complete the procedure for dependent application, but I don't know how.
Is there a procedure manual for this?
We apologize for the inconvenience, but thank you for your response."

output


Is there a procedure manual for the method?

What "this" indicates has become a "method". Certainly, this area may be difficult with simple anaphora resolution alone. You may also want to try using dependency analysis.

which one?

input


Please tell me a little.
The other day, you made an API specification.
What is that version now?

output


How many versions are there now??

It seems that he couldn't find out what the demonstrative was. Personally, I wanted this to be output correctly ...

at the end

I tried various things, but I felt that the accuracy of the anaphora analysis was a little low. Will the for Enterprise version be a little more accurate? On the other hand, in the sentence type judgment, the question sentence was always acquired correctly. I was a little surprised that I could get the question even if there was no "?"

If you can get a simple question from chat or email, In addition to the purposes mentioned at the beginning of this article, it can also be used for learning data of chatbots, for example.

This time, I simply used anaphora resolution and sentence type discrimination, I would like to work on it with a little more ingenuity.

Recommended Posts

I don't have time, so can I just ask a question? feat. COTOHA API
I have a question about whitespace
I have a question! (Python, django) Easy
I tried to scrape YouTube, but I can use the API, so don't do it.
I have read 10 books related to time series data, so I will write a book review.
I don't have a GPU, but I'll try deep learning