[PYTHON] To people who are "recruiting but not recruiting"

** Twitter introduces artificial intelligence and articles written in other media **, so if you want to know more about artificial intelligence, etc. ** Feel free to follow! ** **

<!-<Updated on 2/25 (Tue)> ** Added Shinjiro Koizumi syntax [https://qiita.com/omiita/items/0f811f15e569bf2539b8#6-%E7%95%AA%E5%A4%] 96% E7% B7% A8% E5% B0% 8F% E6% B3% 89% E9% 80% B2% E6% AC% A1% E9% 83% 8E% E6% A7% 8B% E6% 96% 87) Did. **->

1. I teach, but I don't explain

Prime Minister Abe's ** "I am recruiting, but not recruiting" ** remark Inspired by **, I made a program that automatically converts the entered sentence into a sentence that says "I am recruiting but not recruiting" **!

If you enter "Recruit people", it will be converted to the sentence "We are recruiting people, but we are not recruiting".

Example of seeing cherry blossoms


$ python abe.py "See the cherry blossoms"
I see the cherry blossoms, but I haven't seen them

2. I use it, but I don't use it

What I used Use
Python 3.7.0 code
COTOHA API Morphological analysis and similarity calculation
WordNet Synonyms
IPA dictionary Verb conjugation

3. I teach you the mechanism in detail, but I do not explain it in detail.

目的

3.1 Extract nouns and verbs

ステップ1

3.2 List synonyms

ステップ2

3.3 Extract only nouns

ステップ3

3.4 Measurement of similarity between original text and synonyms

ステップ4

Convert to 3.5 continuous connection

ステップ5

3.6 Combine

ステップ6

4. Shows the code but does not show it

Code (click)

Import

abe.py


# -*- coding:utf-8 -*-

import os
import urllib.request
import json
import configparser
import codecs
import csv
import sys
import sqlite3
from collections import namedtuple
import types

COTOHA

abe.py


#/_/_/_/_/_/_/_/_/_/_/_/_/COTOHA_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
#The code for this part is taken from here.
# https://qiita.com/gossy5454/items/83072418fb0c5f3e269f

class CotohaApi:
    #Initialization
    def __init__(self, client_id, client_secret, developer_api_base_url, access_token_publish_url):
        self.client_id = client_id
        self.client_secret = client_secret
        self.developer_api_base_url = developer_api_base_url
        self.access_token_publish_url = access_token_publish_url
        self.getAccessToken()

    #Get access token
    def getAccessToken(self):
        #Access token acquisition URL specification
        url = self.access_token_publish_url

        #Header specification
        headers={
            "Content-Type": "application/json;charset=UTF-8"
        }

        #Request body specification
        data = {
            "grantType": "client_credentials",
            "clientId": self.client_id,
            "clientSecret": self.client_secret
        }
        #Encode request body specification to JSON
        data = json.dumps(data).encode()

        #Request generation
        req = urllib.request.Request(url, data, headers)

        #Send a request and receive a response
        res = urllib.request.urlopen(req)

        #Get response body
        res_body = res.read()

        #Decode the response body from JSON
        res_body = json.loads(res_body)

        #Get an access token from the response body
        self.access_token = res_body["access_token"]


    #Parsing API
    def parse(self, sentence):
        #Parsing API URL specification
        url = self.developer_api_base_url + "v1/parse"
        #Header specification
        headers={
            "Authorization": "Bearer " + self.access_token,
            "Content-Type": "application/json;charset=UTF-8",
        }
        #Request body specification
        data = {
            "sentence": sentence
        }
        #Encode request body specification to JSON
        data = json.dumps(data).encode()
        #Request generation
        req = urllib.request.Request(url, data, headers)
        #Send a request and receive a response
        try:
            res = urllib.request.urlopen(req)
        #What to do if an error occurs in the request
        except urllib.request.HTTPError as e:
            #If the status code is 401 Unauthorized, get the access token again and request again
            if e.code == 401:
                print ("get access token")
                self.access_token = getAccessToken(self.client_id, self.client_secret)
                headers["Authorization"] = "Bearer " + self.access_token
                req = urllib.request.Request(url, data, headers)
                res = urllib.request.urlopen(req)
            #Show cause for errors other than 401
            else:
                print ("<Error> " + e.reason)

        #Get response body
        res_body = res.read()
        #Decode the response body from JSON
        res_body = json.loads(res_body)
        #Get analysis result from response body
        return res_body


    #Similarity calculation API
    def similarity(self, s1, s2):
        #Similarity calculation API URL specification
        url = self.developer_api_base_url + "v1/similarity"
        #Header specification
        headers={
            "Authorization": "Bearer " + self.access_token,
            "Content-Type": "application/json;charset=UTF-8",
        }
        #Request body specification
        data = {
            "s1": s1,
            "s2": s2
        }
        #Encode request body specification to JSON
        data = json.dumps(data).encode()
        #Request generation
        req = urllib.request.Request(url, data, headers)
        #Send a request and receive a response
        try:
            res = urllib.request.urlopen(req)
        #What to do if an error occurs in the request
        except urllib.request.HTTPError as e:
            #If the status code is 401 Unauthorized, get the access token again and request again
            if e.code == 401:
                print ("get access token")
                self.access_token = getAccessToken(self.client_id, self.client_secret)
                headers["Authorization"] = "Bearer " + self.access_token
                req = urllib.request.Request(url, data, headers)
                res = urllib.request.urlopen(req)
            #Show cause for errors other than 401
            else:
                print ("<Error> " + e.reason)

        #Get response body
        res_body = res.read()
        #Decode the response body from JSON
        res_body = json.loads(res_body)
        #Get analysis result from response body
        return res_body

Convert to continuous connection

abe.py


#/_/_/_/_/_/_/_/_/_/_/_/_/CONVERSION_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

def convert(word):
    file_name = "./data/Verb.csv"
    with open(file_name,"r") as f:
        handler = csv.reader(f)
        for row in handler:
            if word == row[10]: #Part of speech discovery
                if "Continuous connection" in row[9]: #Utilization discovery
                    return row[0]
    return None

Synonyms

abe.py


#/_/_/_/_/_/_/_/_/_/_/_/_/SYNONYM_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
#The code for this part is taken from here.
# https://www.yoheim.net/blog.php?q=20160201

conn = sqlite3.connect("./data/wnjpn.db")

Word = namedtuple('Word', 'wordid lang lemma pron pos')

def getWords(lemma):
  cur = conn.execute("select * from word where lemma=?", (lemma,))
  return [Word(*row) for row in cur]


Sense = namedtuple('Sense', 'synset wordid lang rank lexid freq src')

def getSenses(word):
  cur = conn.execute("select * from sense where wordid=?", (word.wordid,))
  return [Sense(*row) for row in cur]

Synset = namedtuple('Synset', 'synset pos name src')

def getSynset(synset):
  cur = conn.execute("select * from synset where synset=?", (synset,))
  return Synset(*cur.fetchone())

def getWordsFromSynset(synset, lang):
  cur = conn.execute("select word.* from sense, word where synset=? and word.lang=? and sense.wordid = word.wordid;", (synset,lang))
  return [Word(*row) for row in cur]

def getWordsFromSenses(sense, lang="jpn"):
  synonym = {}
  for s in sense:
    lemmas = []
    syns = getWordsFromSynset(s.synset, lang)
    for sy in syns:
      lemmas.append(sy.lemma)
    synonym[getSynset(s.synset).name] = lemmas
  return synonym

def getSynonym (word):
    synonym = {}
    words = getWords(word)
    if words:
        for w in words:
            sense = getSenses(w)
            s = getWordsFromSenses(sense)
            synonym = dict(list(synonym.items()) + list(s.items()))
    return synonym

Main

abe.py


#/_/_/_/_/_/_/_/_/_/_/_/_/MAIN_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
if __name__ == '__main__':
    #Get the location of the source file
    APP_ROOT = os.path.dirname(os.path.abspath( __file__)) + "/"

    #Get set value
    config = configparser.ConfigParser()
    config.read(APP_ROOT + "config.ini")
    CLIENT_ID = config.get("COTOHA API", "Developer Client id")
    CLIENT_SECRET = config.get("COTOHA API", "Developer Client secret")
    DEVELOPER_API_BASE_URL = config.get("COTOHA API", "Developer API Base URL")
    ACCESS_TOKEN_PUBLISH_URL = config.get("COTOHA API", "Access Token Publish URL")

    #COTOHA API instantiation
    cotoha_api = CotohaApi(CLIENT_ID, CLIENT_SECRET, DEVELOPER_API_BASE_URL, ACCESS_TOKEN_PUBLISH_URL)

    #Analysis target sentence
    if len(sys.argv) >= 2:
        sentence = sys.argv[1]
    else:
        raise TypeError

    #Take a verb from the original sentence and convert it to a continuous form connection
    result = cotoha_api.parse(sentence)
    ret = ""
    verb = ""
    for chunk in result["result"]:
        for token in chunk["tokens"]:
            if token["pos"] == "Verb stem":
                verb = token["lemma"]
                form = token["form"]
                conv_verb = convert(verb)
                if conv_verb==None:
                    ret += form
                else:
                    ret += conv_verb

                if ret[-1] == "Hmm":
                    ret += "But"
                else:
                    ret += "Yes, but"
                break
            else:
                ret += token["form"]

    #Take synonyms for verbs
    synonym = getSynonym(verb)
    noun = ""
    sim = 0.

    #Extract synonyms for the most similar nouns
    for syns in synonym.values():
        for syn in syns:
            result = cotoha_api.parse(syn)['result'][0]['tokens'][0]
            if result['pos'] == 'noun':
                cand = result['form']
                cand_sim = cotoha_api.similarity(sentence, cand+'To do')['result']['score']
                if cand_sim > sim:
                    noun = result['form']
                    sim = cand_sim
    ret += noun
    ret += "Not done"

    #Final output
    print(ret)

config.ini

config.ini


#To use the COTOHA API, register with the COTOHA API to get an ID and SECRET, and
# config.You need to create an ini file.
# https://api.ce-cotoha.com/contents/index.html

[COTOHA API]
Developer API Base URL: https://api.ce-cotoha.com/api/dev/nlp/
Developer Client id: IDIDIDIDIDIDIDIDIDIDIDIDIDIDIDI
Developer Client secret: SECRETSECRETSECRETSECRET
Access Token Publish URL: https://api.ce-cotoha.com/v1/oauth/accesstokens

5. I've tried it, but I haven't.

$ python abe.py "drink alcohol"
I'm drinking, but I'm not
$ python abe.py "go back home"
I'm home, but I'm not home
$ python abe.py "See the cherry blossoms"
I see the cherry blossoms, but I haven't seen them
$ python abe.py "Eat sushi"
I'm eating sushi, but I'm not eating
$ python abe.py "Invite to the eve"
I have been invited to the eve, but I have not.
** Other things I tried ** (click)
$ python abe.py "Stay at the hotel"
I'm staying at a hotel, but I'm not staying
$ python abe.py "answer the questions"
Answered the question but not answered
$ python abe.py "Sleep at night"
I sleep at night, but I don't sleep
$ python abe.py "Walk outside"
I'm walking outside, but I'm not walking
$ python abe.py "View the net"
I'm looking at the net, but I haven't checked
$ python abe.py "Buy meat"
I'm buying meat, but not
$ python abe.py "Burn the fire"
I'm burning a fire, but I'm not burning

6. Shinjiro Koizumi syntax

Inspired by Mr. Shinjiro Koizumi's ** "I said that I am reflecting on it, but I am reflecting on it." ** Remark ** I made a program that automatically converts the input sentence to Shinjiro Koizumi syntax! ** **

** If the above is called Shinzo Abe syntax **, Shinzo Abe syntax calls "affirmative sentence + similar negative sentence".

安倍晋三構文

On the other hand, ** Shinjiro Koizumi's syntax is simply "affirmative sentence + similar affirmative sentence" **, and I said that it is similar to ** Shinzo Abe's syntax, but it is similar **. (It just changes the way of joining in step 3.6.)

小泉進次郎構文

6.1 I said I'm trying, but I'm trying

$ python sexy.py 'I keep a promise'
I said I'm keeping my promise, but I'm keeping it.
$ python sexy.py 'Entertaining foreigners'
I said that I am entertaining foreigners, but I welcome them.
$ python sexy.py 'Take a break from the company'
I said I'm absent from work, but I'm resting.
$ python sexy.py 'Address environmental issues'
I said that I am working on environmental issues, but I am facing it.
$ python sexy.py 'Destroy NHK'
I said that I'm destroying nhk, but I'm destroying it.

7. Summarized but not summarized

I made a program that automatically converts a sentence into a sentence that says "I am recruiting but not recruiting"! ** We are soliciting likes and comments, but not soliciting **. (If there is a sentence that you are interested in what the output will be, I will try it, so feel free to comment!)

8. I see it, but I don't refer to it

Recommended Posts

To people who are "recruiting but not recruiting"
People who are accustomed to Android programs try multithreading with Python
Links to people who are just starting data analysis with python
[YOLO v5] Object detection for people who are masked and those who are not
Infrastructure automation tool Ansible for people who are not good at Ruby
Developed a simple spreadsheet app for people who are not good at Excel
For those who are new to programming but have decided to analyze data with Python
Follow all users who are not following on Twitter
Machine learning A story about people who are not familiar with GBDT using GBDT in Python
Study procedures that people who want to study elm really need
About "AssertionError: daemonic processes are not allowed to have children"
[AWS IoT] Delete certificates that are not attached to things
Introducing how to use argparse for people who are laid back with as little effort as possible