[PYTHON] When I investigated whether the COTOHA API could understand comics, it was reasonable.

Introduction

This article has been entered in Qiita x COTOHA API Present Project.

What is COTOHA API?

This service provides natural language processing technology and speech recognition / synthesis technology that utilize the research results of the NTT Group for more than 40 years provided by NTT Communications, so that they can be easily used with APIs.

Natural language processing such as parsing and named entity recognition, as well as characteristic "summary" and "user attribute estimation" can also be used.

■ Service introduction link https://api.ce-cotoha.com/contents/index.html

What is Milk Boy?

Milk Boy is a male comedy combination belonging to Yoshimoto Kogyo. Formed in July 2007. 2019 M-1 champion.

On the M-1 1st stage, based on the theme of "corn flakes," we will showcase the story of developing features that are likely to correspond to corn flakes and features that are unlikely to correspond to light interactions. Recorded "681 points", the highest score ever! Click here for material

=== Omitted because it is not technically related ===

I investigated

It's finally the main subject. This milk boy manzai creates laughter by repeating "characteristics that are likely to apply to the object" and "characteristics that are unlikely to apply". Perhaps this manzai can be understood by AI, which is good at reading comprehension, that is, COTOHA.

** Let's authenticate first ** ・ COTOHA API has a free Developer plan, so register first. ・ Get Client ID and Client Secret from the portal

#COTOHA API authentication with Python

import requests,json

def authorization():
    cotoha_url_auth = "https://api.ce-cotoha.com/v1/oauth/accesstokens"
    headers = {"Content-Type" : "application/json"}
    data = {
        "grantType" : "client_credentials",
        "clientId" : cotoha_clientId,          #User-specific values here
        "clientSecret" : cotoha_clientSecret   #User-specific values here
    }

    res = requests.post(cotoha_url_auth, headers=headers, json=data)
    token = json.loads(res.text)["access_token"]
    #print("COTOHA API authenticated")
    
    return token #Return the access token string and finish

** Try using the similarity judgment API ** This time I will use ** "Similarity judgment" ** in the COTOHA API

What is similarity judgment? A RESTful API that calculates the semantic similarity between two sentences. The similarity is output in the domain of 0-1 and the closer it is to 1, the greater the similarity between the two sentences. Since the similarity is calculated using the semantic information of the words contained in the sentence, it is possible to estimate the similarity between texts containing different words.

#Similarity judgment
def similar(sentence1, sentence2, token):
    url =  "https://api.ce-cotoha.com/api/dev" + "/nlp/v1/similarity"
    header = {
        "Content-Type" : "application/json;charset=UTF-8",
        "Authorization" : "Bearer "  +token
    }
    data = {
        "s1" : sentence1,
        "s2" : sentence2
    }
    req = requests.post(url, headers = header, json = data)
    return req.text

For token, use the Access token output by the previous authentication. If you put two sentences side by side and request it, the similarity will be returned.

Example: ("I am good at exercising", "I love sports") ⇒Similarity: {'result': {'score': 0.94150746},'status': 0,'message':'OK'}

Even though there are almost no same words, this match rate ...! Great!

** Now it's time to judge the corn flakes ** The introduction has become longer, but by combining the above two and processing it lightly, I made a function to judge whether the input sentence is corn flakes.

#Corn flakes function
def cornflake(sentence, obj = "Corn flakes"):
    try: 
        token = authorization()
        sim = float(json.loads(similar(obj, sentence, token))["result"]["score"])
        if sim>0.4:
            rep = "It," + obj + "Isn't it!(Degree of similarity:{:.2f})".format(sim)
        else:
            rep = "Hona" + obj + "(Similarity):{:.2f})".format(sim)
    except:
        rep = "An error occurred during API request"
    return rep

I made. To no avail, if you specify the argument ʻobj`, you can judge other than corn flakes.

** Survey started **

** Note: From here on, the spoilers of "Milk Boy" are included. Please be careful! !! ** **

I'll think about your favorite breakfast together, so please tell me what kind of features you said.

corn flakes ("sweet and crunchy and eat with milk ")

Output:'That's corn flakes! (Similarity: 0.74)'

cornflake ("Oh, that's fine for the last meal before you die. ")

Output:'It's like corn flakes (similarity: 0.25)'

cornflake ("I don't know why the pentagon of nutritional balance is so wide. ")

Output:'It's like corn flakes (similarity: 0.16)'

cornflake ("I can't see the producer's face when I'm eating. ") Output:'That's corn flakes! (Similarity: 0.56)'

cornflake ("Okan says Chinese in genre. ")

Output:'It's like corn flakes (similarity: 0.15)'

Result comment

** I feel like corn flakes. .. .. ** ** The similarity judgment used this time is only * "determining the semantic similarity between sentences" *, but the purpose is * "determining whether or not it has the characteristics of the object." * But it seems to be ** fair accuracy **. (I tried various things other than the above, but overall the similarity tended to be low.

Come to think of it, "corn flakes" and "wide nutritional balance pentagon" do not mean the same thing, so that's right. .. ..

ending

cornflake ("There seems to be a company that provides strong services in Japanese", "NTT")

Output:'That's NTT! (Similarity: 0.44)'

Thank you very much.

Recommended Posts

When I investigated whether the COTOHA API could understand comics, it was reasonable.
When I tried to run Python, it was skipped to the Microsoft Store
I tried to touch the COTOHA API
I tried to make the phone ring when it was posted at the IoT post
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
When I checked the query generated by Django, it was issued in large numbers
When I crawl the webapi that appears during rendering, it was played with CORS
What I did when I was angry to put it in with the enable-shared option
The story when I was using IntelliJ on Linux and could not input Japanese
Python's lru_cache was slow (I investigated it because I misunderstood it)
When I tried the AtCoder Beginner Contest, it was a terrible result, so I look back
[First COTOHA API] I tried to summarize the old story
I wrote it in Go to understand the SOLID principle
PyTorch's book was difficult to understand, so I supplemented it