[PYTHON] The problem that give me chocolate is not made even if the correspondence analysis is done with COTOHA API

Background [Qiita x COTOHA API present plan] I was worried about it, so I decided to use it for a while.

MeCab or KNP are famous libraries for natural language processing, but I was ashamed to know the COTOHA API for the first time at this event.

Motive It may be a little hurdle personally, but I tried using the anatomical resolution API first. This analysis is the process of replacing pronouns ("I", "you", "he", etc.) and demonstratives ("that", "this") in sentences with such people and objects. The reason for using it is that I wanted to use it as an initial stage to clarify who did what and then process it if I had time to spare.

Is the event in mid-March? .. ..

Will it be in time?

Environment

Development

** Script code ** (Click to see the code.)
import requests
import json
import time
import sys

#---Get these 4 parameters with Portal---
PUBLISH_URL = "--- get your parameter ---"
CLIENT_ID = "--- get your parameter ---" 
CLIENT_SECRET = "--- get your parameter ---" 
BASE_URL = "--- get your parameter ---"

def getToken():
    header = {"Content-Type": "application/json"}
    contents = {
        "grantType": "client_credentials",
        "clientId": CLIENT_ID,
        "clientSecret": CLIENT_SECRET
    }
    raw_res = requests.post(PUBLISH_URL, headers=header, json=contents)
    response = raw_res.json()
    return response["access_token"]

def coreference(token, sentence):
    header = {
        "Authorization": "Bearer {}".format(token),
        "Content-Type": "application/json"
    }
    contents = {
        "document": sentence
    }
    raw_res = requests.post(
        BASE_URL +
        "nlp/v1/coreference",
        headers=header,
        json=contents)
    response = raw_res.json()
    return response

if __name__ == "__main__":

    if len(sys.argv) != 2:
        sys.exit()
    message = sys.argv[1]

    token = getToken()
    time.sleep(0.5)
    
    print(coreference(token, message))

As a flow Get Token-> Call each API you want to use There are two.

curl does the same in python.

$ curl -X POST -H "Content-Type:application/json" -d '{
  "grantType": "client_credentials",
  "clientId": "[clientid]",
  "clientSecret": "[clientsecret]"
}' [Access Token Publish URL
]
$ curl -H "Content-Type:application/json;charset=UTF-8" -H "Authorization:Bearer [access_token]" -X POST -d '{
"document": --Enter the text you want to analyze here--
}' "[Developer API Base URL]/nlp/v1/coreference"

Consequence

Let's analyze using two sentences.

--Yamada bought cup noodles. He was eating deliciously. --Yamada gave Saito cup noodles. He was eating deliciously.

$ python main.py Yamada bought cup noodles. He was eating deliciously.
{'result': {'coreference': [{'representative_id': 0, 'referents': [{'referent_id': 0, 'sentence_id': 0, 'token_id_from': 0, 'token_id_to': 0, 'form': 'Yamada'}, {'referent_id': 1, 'sentence_id': 0, 'token_id_from': 10, 'token_id_to': 10, 'form': 'he'}]}], 'tokens': [['Yamada', 'Kun', 'Is', 'cup', 'noodles', 'To', 'Buy', 'Tsu', 'Ta', '。', 'he', 'Is', 'Delicious', 'so', 'To', 'eat', 'hand', 'I', 'Ta', '。']]}, 'status': 0, 'message': 'OK'}
$ python main.py Yamada gave Saito cup noodles. He was eating deliciously.
{'result': {'coreference': [{'representative_id': 0, 'referents': [{'referent_id': 0, 'sentence_id': 0, 'token_id_from': 0, 'token_id_to': 0, 'form': 'Yamada'}, {'referent_id': 1, 'sentence_id': 0, 'token_id_from': 13, 'token_id_to': 13, 'form': 'he'}]}], 'tokens': [['Yamada', 'You', 'Is', 'Saito', 'You', 'To', 'cup', 'noodles', 'To', 'Watari', 'Shi', 'Ta', '。', 'he', 'Is', '美味Shi', 'so', 'To', 'eat', 'hand', 'I', 'Ta', '。']]}, 'status': 0, 'message': 'OK'}

Consideration

Yamada bought cup noodles. He was eating deliciously. `` Then 'coreference': [{'representative_id': 0,'referents': [{'referent_id': 0,'sentence_id': 0,'token_id_from': 0,' token_id_to': 0,'form':' Yamada' }, {'referent_id': 1,'sentence_id': 0,'token_id_from': 10,'token_id_to': 10,'form':'he'}] You can see thatYamada-he` is connected.

But, Yamada gave Saito cup noodles. He was eating deliciously. `` Then 'coreference': [{'representative_id': 0,'referents': [{'referent_id': 0,'sentence_id': 0,'token_id_from': 0,'token_id_to': 0,'form':' Yamada' }, {'referent_id': 1,'sentence_id': 0,'token_id_from': 13,'token_id_to': 13,'form':'he'}] AndYamada-he are connected and not Saito-he`. In this case, even if you hand over the cup noodles, Yamada will have eaten it. It's Gouda thinking: scream :.

Comparation -> KNP

I will try if it was KNP. http://lotus.kuee.kyoto-u.ac.jp/~ryohei/zero_anaphora/index.cgi noodle.png

This is Saito-he It seems that it is recognized correctly because it is.

but, Yamada gave Saito cup noodles. I was eating it deliciously. `` When I delete the pronoun and analyze it, it becomes Yamada-he` and it is not recognized correctly.

noodle.png

Conclusion

This time, the anaphora analysis is called direct anaphora, and the pronoun is explicitly written, but it is often not written in the text. http://adsmedia.hatenablog.com/entry/2017/02/20/084846 There are indirect anaphora, external anaphora, zero anaphora, etc.

The genome was all analyzed in the 2000s, but I don't know how to make it. There are types, but there are countless combinations.

Natural language can also discriminate from one sentence to the type of word, but it can still be examined like anaphora resolution. There are many. As one of the methods, it may be difficult for users to collect sources and apply machine learning to write words that are not in the sentence. It's like Saito eating cup noodles with an 80% chance of being among various characters. : robot:

PostScript

Next, use Parsing to analyze or create a product.

Recommended Posts

The problem that give me chocolate is not made even if the correspondence analysis is done with COTOHA API
Problem that even if Pool.close () is done with Python multiprocessing, the process remains paused and the memory increases steadily.
[VLC] How to deal with the problem that it is not in the foreground during playback
Note that the Google Maps Android API GoogleMap.getProjection is not a singleton
Problem that even if Pool.close () is done with Python multiprocessing, the process remains paused and the memory increases steadily.
Kill the process with sudo kill -9