[PYTHON] PyPI package for super easy use of Cotoha on Google colab

2020/04/10 PyPI package has been released!

For more information here (I'm very happy to have a star!)

If you want to use it quickly, please refer to the code below.

$ pip install cotoha_at_python

from cotohacall.main  import cotoha_call
import os

os.environ['CLIENT_ID'] = 'Your ID'
os.environ['CLIENT_SECRET'] = 'Ypur Secret'

sentence = 'An API service that realizes highly accurate natural language analysis of Japanese. You can easily use the natural language analysis technology that makes use of the research results of the NTT Group for more than 40 years with the COTOHA API.'
cotoha_call('keyword', sentence)

# Return
"""
[{'form': 'High precision', 'score': 20.0},
 {'form': 'Realization', 'score': 16.8278},
 {'form': 'Easy', 'score': 10.8133},
 {'form': 'reaserch result', 'score': 10.0},
 {'form': 'cotoha api', 'score': 10.0}]
"""

Preface

Already a super-easy explanation about how to use COTOHA (I tried using the COTOHA API rumored to be easy to handle natural language processing in Python and "Mentos The result of having COTOHA summarize "Memories of Go". With COTOHA's fastest tutorial)

[Qiita x COTOHA API present plan] I wrote it myself. ・ Because it is troublesome to build an environment, use Google Colab! ・ You can use it with just one copy without thinking about anything!

I wanted something like that, so I published it as an article.

If you read this article, you will be able to analyze it instantly with just one copy ...! (Should).

What you can do with this copy + implementation example

`python`


cotoha_call("ne", "I want to eat Black Thunder")

If you enter The analysis result of "** Black Thunder wants to eat **" (this is named entity recognition) is returned.

===> Named entity recognition ===> {'message': '', 'result': [{'begin_pos': 0, 'class': 'ART', 'end_pos': 8, 'extended_class': 'Product_Other', 'form':'Black Thunder', 'source': 'basic', 'std_form':'Black Thunder'}], 'status': 0}

The "ne" can be changed to (it should cover all apis available in developers)

API name	input
Parsing	parse
Named entity recognition	ne
Keyword extraction	keyword
Resolution analysis	coref
Similarity calculation	simi
Sentence type judgment	sen_type
User attribute estimation(β)	user_at
Removal of stagnation(β)	filter
Speech recognition error detection(β)	detect
Sentiment analysis	senti
wrap up(β)	summary

When I try everything using the for statement, it looks like this →

`python`


for api_type in ["ne", "parse", "coref", "keyword", "simi", "sen_type", "user_at", "filter", "detect", "senti", "summary"]:
    cotoha_call(api_type, "I want to eat Black Thunder.")
    print("\n") #Line breaks to make the results easier to see
print("Analysis finished!")

Output

> ===> Named entity recognition ===> {'message': '', 'result': [{'begin_pos': 0, 'class': 'ART', 'end_pos': 8, 'extended_class': 'Product_Other', 'form':'Black Thunder', 'source': 'basic', 'std_form':'Black Thunder'}], 'status': 0} > > ===> Parsing ===> {'message': '', 'result': [{'chunk_info': {'chunk_func': 2, 'chunk_head': 1, 'dep': 'D', 'head': 1, 'id': 0, 'links': []}, 'tokens': [{'attributes': {}, 'features': [], 'form':'black', 'id': 0, 'kana':'black', 'lemma':'black', 'pos':'noun'}, {'attributes': {}, 'dependency_labels': [{'label': 'compound', 'token_id': 0}, {'label': 'case', 'token_id': 2}], 'features': [], 'form':'Thunder', 'id': 1, 'kana':'Thunder', 'lemma':'Thunder', 'pos':'noun'}, {'attributes': {}, 'features': ['Continuous use'], 'form':'to', 'id': 2, 'kana':'wo', 'lemma':'to', 'pos':'case particle'}]}, {'chunk_info': {'chunk_func': 1, 'chunk_head': 0, 'dep': 'O', 'head': -1, 'id': 1, 'links': [{'label': 'object', 'link': 0}], 'predicate': []}, 'tokens': [{'attributes': {}, 'dependency_labels': [{'label': 'dobj', 'token_id': 1}, {'label': 'aux', 'token_id': 4}, {'label': 'punct', 'token_id': 5}], 'features': ['A'], 'form':'eat', 'id': 3, 'kana':'Tabe', 'lemma':'eat', 'pos':'Verb stem'}, {'attributes': {}, 'features': ['End'], 'form':'tai', 'id': 4, 'kana':'Thailand', 'lemma':'want', 'pos':'verb suffix'}, {'attributes': {}, 'features': [], 'form': '。', 'id': 5, 'kana': '', 'lemma': '。', 'pos':'Kuten'}]}], 'status': 0} > > ===> Resolution analysis ===> {'message': 'OK', 'result': {'coreference': [], 'tokens': [[['Black','Thunder','Eat',' Want','. ']]}, 'status': 0} > > ===> Keyword extraction ===> {'message':'','result': [{'form':'Black Thunder','score': 10.0}],'status': 0} > > ===> Similarity calculation ===> {'message': 'OK', 'result': {'score': 0.99846786}, 'status': 0} > > ===> Sentence type judgment ===> {'message': '', 'result': {'dialog_act': ['information-providing'], 'modality': 'declarative'}, 'status': 0} > > ===> User attribute estimation (β) ===> {'message': 'OK', 'result': {'civilstatus':'married', 'hobby': ['ANIMAL', 'COOKING', 'FISHING', 'FORTUNE', 'GYM', 'INTERNET', 'SHOPPING', 'STUDY']}, 'status': 0} > > ===> Stagnation removal (β) ===> {'message': 'OK', 'result': [{'fillers': [], 'fixed_sentence':'I want to eat Black Thunder. ', 'normalized_sentence':'I want to eat Black Thunder. '}], 'status': 0} > > ===> Speech recognition error detection (β) ===> {'message': 'OK', 'result': {'candidates': [{'begin_pos': 4, 'correction': [{'correct_score': 0.709220240165901,'form':'dancer'}, {'correct_score': 0.6137611877341953,'form':'bander'}, {'correct_score': 0.6054945064139393, 'form': 'thunder'}, {'correct_score': 0.5943849175403254,'form':'Sanda'}, {'correct_score': 0.5878497568567171,'form':'Zander'}], 'detect_score': 0.05464221591729093, 'end_pos': 8, 'form':'Thunder'}], 'score': 0.05464221591729093}, 'status': 0} > > ===> Sentiment analysis ===> {'message': 'OK', 'result': {'emotional_phrase': [], 'score': 0.38033421036210907, 'sentiment': 'Neutral'}, 'status': 0} > > ===> Summary (β) ===> {'result':' I want to eat Black Thunder. ',' status': 0} > > Analysis finished! >

Before copying

-Register as a user from COTOHA API and get client_id and client_secret (user registration was pretty easy to understand).

・ If you are in the "Python is what?" State but want to try it, this is a super-easy explanation (I tried using the COTOHA API, which is rumored to be able to handle natural language processing easily. Please read / gossy5454 / items / 83072418fb0c5f3e269f)).

Code to copy

`Rewrite client ID and client secret after copying`


# -*- coding:utf-8 -*-
#Reference: https://qiita.com/gossy5454/items/83072418fb0c5f3e269f#python%E3%81%A7%E4%BD%BF%E3%81%A3%E3%81%A6%E3%81%BF%E3%81%9F

import os
import urllib.request
import json
import configparser
import codecs
import sys

client_id = "Client ID"
client_secret = "Client secret"

developer_api_base_url = "https://api.ce-cotoha.com/api/dev/nlp/"
access_token_publish_url = "https://api.ce-cotoha.com/v1/oauth/accesstokens"

api_name_show_switch = 1 #Set to 0 if you do not want to display the api name in the output result

def cotoha_call(api_type, sentence_1, sentence_2 = "I want to eat white thunder", sent_len = 1, ):
    #Get access token
    def getAccessToken():
        #Access token acquisition URL specification
        url = access_token_publish_url

        #Header specification
        headers={
            "Content-Type": "application/json;charset=UTF-8"
        }

        #Request body specification
        data = {
            "grantType": "client_credentials",
            "clientId": client_id,
            "clientSecret": client_secret
        }
        #Encode request body specification to JSON
        data = json.dumps(data).encode()

        #Request generation
        req = urllib.request.Request(url, data, headers)

        #Send a request and receive a response
        res = urllib.request.urlopen(req)

        #Get response body
        res_body = res.read()

        #Decode the response body from JSON
        res_body = json.loads(res_body)

        #Get an access token from the response body
        access_token = res_body["access_token"]

        return access_token

    #API URL specification
    if api_type == "parse":
        api_name = "Parsing"
        base_url_footer = "v1/" + api_type
        request_body_type = 1
    elif api_type == "ne":
        api_name = "Named entity recognition"
        base_url_footer = "v1/" + api_type
        request_body_type = 1
    elif api_type == "keyword":
        api_name = "Keyword extraction"
        base_url_footer = "v1/" + api_type
        request_body_type = 2
    elif api_type == "coref":
        api_name = "Resolution analysis"
        base_url_footer = "v1/coreference"
        request_body_type = 2
    elif api_type == "simi":
        api_name = "Similarity calculation"
        base_url_footer = "v1/similarity"
        request_body_type = 3
    elif api_type == "sen_type":
        api_name = "Sentence type judgment"
        base_url_footer = "v1/sentence_type"
        request_body_type = 1
    elif api_type == "user_at":
        api_name = "User attribute estimation (β)"
        base_url_footer = "beta/user_attribute"
        request_body_type = 2
    elif api_type == "filter":
        api_name = "Stagnation removal (β)"
        base_url_footer = "beta/remove_filler"
        request_body_type = 4
    elif api_type == "detect":
        api_name = "Speech recognition error detection (β)"
        base_url_footer = "beta/detect_misrecognition"
        request_body_type = 1
    elif api_type == "senti":
        api_name = "Sentiment analysis"
        base_url_footer = "v1/sentiment"
        request_body_type = 1
    elif api_type == "summary":
        api_name = "Summary (β)"
        base_url_footer = "beta/summary"
        request_body_type = 5
    else :
        print("Api Type Error.")
        sys.exit()

    if api_name_show_switch == 1:
        print("===>\n" + api_name + "\n===>")
    
    url = developer_api_base_url + base_url_footer

    #Header specification
    headers={
        "Authorization": "Bearer " + getAccessToken(), #access_token,
        "Content-Type": "application/json;charset=UTF-8",
    }
    #Request body specification
    if request_body_type == 1:
        data = {
            "sentence": sentence_1
        }
    elif request_body_type == 2:
        data = {
            "document": sentence_1
        }
    elif request_body_type == 3:
        data = {
            "s1": sentence_1,
            "s2": sentence_2
        }
    elif request_body_type == 4:
        data = {
            "text": sentence_1
        }
    elif request_body_type == 5:
        data = {
            "document": sentence_1,
            "sent_len": sent_len
        }

    #Encode request body specification to JSON
    data = json.dumps(data).encode()
        #Request generation
    req = urllib.request.Request(url, data, headers)
        #Send a request and receive a response
    try:
        res = urllib.request.urlopen(req)
    #What to do if an error occurs in the request
    except urllib.request.HTTPError as e:
        #If the status code is 401 Unauthorized, get the access token again and request again
        if e.code == 401:
            access_token = getAccessToken()
            headers["Authorization"] = "Bearer " + access_token
            req = urllib.request.Request(url, data, headers)
            res = urllib.request.urlopen(req)
        #Show cause for errors other than 401
        else:
            print ("<Error> " + e.reason)
            #sys.exit()

    #Get response body
    res_body = res.read()
    #Decode the response body from JSON
    res_body = json.loads(res_body)
    #Get analysis result from response body
    return res_body

later

`python`


cotoha_call("input", "The text you want to analyze")

The analysis starts at!

Postscript

Those who participate in the project Let's do our best!

reference

I tried using the rumored COTOHA API in Python because it can handle natural language processing easily