Motive [Qiita x COTOHA API present plan] This is a related post.
I tried using an API different from Problem that give me chocolate is not made even if I analyze it with COTOHA API.
This time Named entity recognition (/ nlp / v1 / ne) API.
With MeCab, I feel that I couldn't extract a person's name without learning proper nouns and registering them in a dictionary. Also, KNP seems to have good accuracy, but the package itself is heavy. : scream: Also, when morphological analysis is performed, whether the machine-learned distributor is excellent or not, even the part of speech is output accurately, but I feel that it is not so if I could classify the nouns that appear a lot in the sentence. I will. In COTOHA, nouns are classified in detail only by API.
In order to easily try how far proper nouns can be extracted, I tried to output the person name and technique name from the sports article.
Environment
Dataset Tokyo Sports The selection criteria is a lofty reason that this sports newspaper is not available in the area where you live. : camera_with_flash:
Method As mentioned above COTOHA API specific extraction https://api.ce-cotoha.com/contents/reference/apireference.html#parsing_io_part I am using.
The player (person) is x ["class "] ==" PSN "and x ["extended_class "] ==" "
, the technique name is x ["class "] ==" ART "and x ["extended_class " "] in [ "Doctrine_Method_Other"]
It is extracted with. Doctrine_Method_Other
means (principle method name_other).
name | Description |
---|---|
ORG | Organization name |
PSN | Personal name |
LOC | place |
ART | Unique name |
DAT | Date representation |
TIM | Time representation |
NUM | Numerical representation |
MNY | Amount expression |
PCT | Percentage expression |
OTH | Other |
Development
Script
import argparse
import requests
from bs4 import BeautifulSoup
import json
#---Get these 4 parameters with Portal---
PUBLISH_URL = "--- get your parameter ---"
CLIENT_ID = "--- get your parameter ---"
CLIENT_SECRET = "--- get your parameter ---"
BASE_URL = "--- get your parameter ---"
class COTOHA:
def __init__(self):
self._token = self._getAccessToken()
def _getAccessToken(self):
header = {"Content-Type": "application/json"}
contents = {
"grantType": "client_credentials",
"clientId": CLIENT_ID,
"clientSecret": CLIENT_SECRET
}
raw_res = requests.post(PUBLISH_URL, headers=header, json=contents)
response = raw_res.json()
return response["access_token"]
def compose(self, sentence):
header = {
"Authorization": "Bearer {}".format(self._token),
"Content-Type": "application/json"
}
contents = {
"sentence": sentence
}
raw_res = requests.post(
BASE_URL +
"nlp/v1/parse",
headers=header,
json=contents)
response = raw_res.json()
return response
def properNoun(self, sentence):
header = {
"Authorization": "Bearer {}".format(self._token),
"Content-Type": "application/json"
}
contents = {
"sentence": sentence
}
raw_res = requests.post(
BASE_URL +
"nlp/v1/ne",
headers=header,
json=contents)
response = raw_res.json()
return response
def keyword(self, sentence):
header = {
"Authorization": "Bearer {}".format(self._token),
"Content-Type": "application/json"
}
contents = {
"document": sentence
}
raw_res = requests.post(
BASE_URL +
"nlp/v1/keyword",
headers=header,
json=contents)
response = raw_res.json()
return response
def coreference(self, sentence):
header = {
"Authorization": "Bearer {}".format(self._token),
"Content-Type": "application/json"
}
contents = {
"document": sentence
}
raw_res = requests.post(
BASE_URL +
"nlp/v1/coreference",
headers=header,
json=contents)
response = raw_res.json()
return response
def extract_norn_list(_apiobj, contents, condition):
dst = []
for p in contents:
items = _apiobj.properNoun(p.text)["result"]
_raw = list(filter(condition, items))
# print(_raw)
#Abbreviations excluded
for _p in _raw:
name = _p["form"]
_exist = False
for pname in dst:
if name in pname:
_exist = True
if not _exist:
dst.append(name)
return dst
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--url")
args = parser.parse_args()
#Create API object
cotoha = COTOHA()
#Get articles from URL(Tokyo sports specifications)
res = requests.get(args.url)
soup = BeautifulSoup(res.text, 'html.parser')
title_text = soup.find('title').get_text()
contents = soup.find('div', {"class": "detail-content"}).find_all("p")
#Extraction condition
def is_person(x): return x["class"] == "PSN" and x["extended_class"] == ""
def is_attack(x): return x["class"] == "ART" and x["extended_class"] in [
"Doctrine_Method_Other"]
#Output player
print(extract_norn_list(cotoha, contents, is_person))
#Output the technique name
print(extract_norn_list(cotoha, contents, is_attack))
if __name__ == "__main__":
main()
Command
python main.py --url https://www.tokyo-sports.co.jp/prores/ddt/1754700/
Consequence
Run with two articles.
https://www.tokyo-sports.co.jp/prores/njpw/1682622/
dataset
In the IWGP / US Heavyweight Championship match held at New Japan Pro-Wrestling's biggest box office "Wrestle Kingdom 14" (Tokyo Dome on the 5th), champion John Moxley (34) was the IWGP tag champion Juice Robinson (30). He rejected the challenge and succeeded in his first defense.
At the Tokyo Dome tournament the night before (4th), Moxley regained the title from Lance Archer (32). Juice won the Tag Team Championship in combination with David Finlay (26). The next day, it was a decisive battle between the new champions, but Moxley had robbed Juice of the title in June last year and had declared that he would settle on the ring the night before.
Juice took the lead in the early stages, but Moxley took out a chair outside the venue and hit his back. Furthermore, I bit the forehead of the juice. A rough man who rampaged as a "mad dog" during the WWE era forcibly regained his pace.
Juice counterattacked with a daring high-angle power bomb, but the champion fired a series of unexpected attacks with a four-shaped iron pillar from a four-legged character. The challenger is from the avalanche brainbuster to Jack Hammer and German. I evaded Moxley's Deslider (double-armed DDT) and punched it out with a lariat.
However, the champion flashes a strong running knee from the smashing battle. After turning back the pulp friction of the juice, he exploded a deadly deslider from DDT and took 3 counts at 12 minutes 48 seconds.
After the match, the entrance theme song was played, and Minoru Suzuki (51) suddenly appeared. He was hit by Deslider from Moxley at the Hiroshima tournament on December 8th last year, and he cannot hide his anger with a rugged expression. After taking off the jersey on the flower road and getting ready for battle, he met the champion and elbow on the ring. Powerful Minoru KOed Moxley with a Gotch-type pile driver from rear-naked choke.
Minoru grabbed the microphone and declared war, "Who are you selling fights to, this Yarrow! I'm Minoru Suzuki, a professional wrestler. I'll buy this fight!" The outbreak of the "rabies" vs. "bad guys" conflict over the US Championship has given off a dangerous scent.
Minoru's story "Who are you selling fights to? Hey. I was waiting for you to come in front of me. John Moxley ... No, John Boy, take care of me. I'll kill you."
Juice's story "Everything ends here. Jon Moxley was stronger than me today. I couldn't surpass it again. I thought about today after yesterday's match. Until then, today's match. I didn't think about that. "
output
['John Moxley', 'Lance Archer', 'David finlay', 'Minoru Suzuki', 'John Boy']
['Foot 4 character consolidation', 'Avalanche', 'Jackhammer', 'Lariat', 'Rear-naked choke']
https://www.tokyo-sports.co.jp/prores/njpw/1681815/
dataset
Tetsuya Naito (37) defeated champion Jay White (27) at the IWGP Intercontinental (IC) Championship held at New Japan Pro-Wrestling's biggest box office "Wrestle Kingdom 14" (Tokyo Dome on the 4th). In addition to regaining the title, he advanced to a double title match with the IWGP Heavyweight Champion (Kazuchika Okada VS Kota Ibushi's winner) at the Tokyo Dome Tournament on the 5th.
Lost to Jay at the Kobe tournament in September last year, and fell from the IC title for the second time last year. He also experienced the humiliation of the nomination "0" at the "Pro Wrestling Awards" established by the Tokyo Sports Newspaper. However, a large crowd is waiting for the resurrection of the "out of control man". When he pushed his back with a big Naito call from the beginning, he gave Jay a merciless boo.
Naito took the lead by shooting a neckbreaker with an apron outside the venue. However, Jay's second outer road pulls Naito's leg from the outside and disturbs the pace. The champion focused on Naito's left knee and attacked. Naito jumps from the corner and fights back with Frankensteiner. It's a low-altitude drop kick that is skewered after spitting on the opponent's face.
It seemed that this would keep the pace, but he was in agony after eating Jay's DDT and was attacked on his left knee again. It is thrown out of the hall with a back drop, and the inferiority does not change. In addition, the knees were tightened with the four back legs.
Naito in a big pinch breaks the rope while distorting his face. When I managed to escape, it was a counterattack kick. In addition, the onslaught of spine buster, rotary DDT, avalanche Frankensteiner, and Gloria. The referee broke into the gap when the referee went down due to an accident, but he was repulsed by a sneak attack.
Naito, who played the game, fired a series of Coryend-style Destino. After completely preventing Jay's deadly Blade Runner (transformed face crushing), he finally took 3 counts with the whole body of Destino.
Victory in a fierce battle at 33 minutes and 54 seconds. "Uncontrollable man" who has been advocating the ambition of IWGP and IC, two crowns since January last year, will challenge the big stage of the generation to complete revival.
[Naito's story] "The purpose of this two-game series is not to take this belt. I'm glad that the customer said" Congratulations to Naito. "But Tranquilo. It's not the purpose of this time, so there. Well, which is tomorrow's opponent? My plan is okada. Ideal is okada. Come on. "
[Jay White's story] "Where did he (Naito) go ... I was unfortunately one of the supporting characters in the story that everyone made. Everyone wanted Jay White to lose. It must have been. Naito, who you like, won. Why don't you laugh. My new Destino ... Fate begins tomorrow. "
output
['Tetsuya Naito', 'Jay White', 'Kazuchika Okada', 'Kota Ibushi', 'Destino...Destiny']
['Neckbreaker', 'Back drop', 'Foot 4 character consolidation', 'Spine Buster']
Consideration
――The names of the players are extracted except for "Destino ... Fate". It seems that general person names can be classified without problems.
――It's a technical name, but unfortunately it doesn't appear in the classification of COTOHA API. The combination that seems to be the most extractable from the API output several times was class: ART, extended_class: Doctrine_Method_Other
, so I tried to output it, but I tried to output it, but" High angle power bomb "and" Coryend type Destino " "Is not applicable. If you add class: ART, extended_class: Product
as the second condition, other than the technique name will be extracted, so 100% was strict: tired_face:
――If it is a specialized book rather than a sports article, it may be effective. This is because the following type parameters can be added to the API. (Only for Enterprise users ,,, so it can be used for a fee.)
param | name |
---|---|
IT | Computer / Information / Communication |
automobile | Automobile |
chemistry | Chemical / petroleum industry |
company | Company |
construction | Civil engineering and construction |
economy | Economy / Decree |
energy | Electric power / energy |
institution | Institution / organization |
machinery | machine |
medical | Medicine |
metal | Non-ferrous / metal |
PostScript I said that the accuracy of person name extraction is good, but for some reason the recently retired "Beast God Thunder Liger" was not extracted correctly. It was classified as "ART: Unique object name". : japanese_ogre: Isn't it better to send support to the staff of the talent directory?: Thinking :. As: sushi :.
Recommended Posts