__ (Reference) __ There are two types of automatic sentence summarization models: * Extractrive summarization * and * Abstractive summarization * models.
・ @Koreyou's Qiita article "Introduction of thesis: Neural Latent Extractive Document Summarization" ・ [[DL Round Reading] Abstractive Summarization of Reddit Posts with Multi-level Memory Networks](https://www.slideshare.net/DeepLearningJP2016/dlabstractive-summarization-of-reddit-posts-with-multilevel-memory-networks- 132350977)
Terminal
pip install spacy
pip install "https://github.com/megagonlabs/ginza/releases/download/latest/ginza-latest.tar.gz"
extract_words_by_entity_label
Python3
def extract_words_by_entity_label(text, label):
if label in ["PERSON", "NORP", "FAC", "ORG", "GPE", "LOC", "PRODUCT", "EVENT", "WORK_OF_ART", "LAW", "LANGUAGE", "DATE", "TIME", "PERCENT", "MONEY", "QUANTITY", "ORDINAL", "CARDINAL"]:
text = text.replace("\n", "")
doc = nlp(text)
words_list = [ent.text for ent in doc.ents if ent.label_ == label]
else:
print("Its named entity label does not exist.")
words_list = []
return words_list
__ The following * spaCy * official website has a list of * Entity * label names defined by * spaCy * __ -Spacy * Named Entity Recognition *
Label type
PERSON People, including fictional.
NORP Nationalities or religious or political groups.
FAC Buildings, airports, highways, bridges, etc.
ORG Companies, agencies, institutions, etc.
GPE Countries, cities, states.
LOC Non-GPE locations, mountain ranges, bodies of water.
PRODUCT Objects, vehicles, foods, etc. (Not services.)
EVENT Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART Titles of books, songs, etc.
LAW Named documents made into laws.
LANGUAGE Any named language.
DATE Absolute or relative dates or periods.
TIME Times smaller than a day.
PERCENT Percentage, including ”%“.
MONEY Monetary values, including unit.
QUANTITY Measurements, as of weight or distance.
ORDINAL “first”, “second”, etc.
CARDINAL Numerals that do not fall under another type.
Chief Cabinet Secretary Kato emphasized that he would make every effort to return to Japan as soon as possible by recording a radio program that the government and others are broadcasting to abductees in North Korea, saying, "Hug each other with your family. Please continue to have a strong feeling that the day will come and survive. "
On the 16th, Secretary of State Kato, who also serves as the minister in charge of the abduction issue, is investigating the so-called specific disappearances who cannot be ruled out by the government and North Korea. We recorded a radio program that is being broadcast to the abductees in Japan.
In this, Chief Cabinet Secretary Kato said, "The abduction issue is regarded as the most important issue in the Suga Cabinet. I met with my family with Prime Minister Suga and shared the earnest desire to" get results at all costs. " ".
He said, "We are still determined to break the shell of mutual distrust, settle the unfortunate past, and normalize diplomatic relations with North Korea." The government is working together to return the abductees as soon as possible. He emphasized that he would do his best.
A And he said, "Keep in mind that the day will come when you will step on the soil of your country again and hug your family who are waiting for you to return home. Please take good care of yourself and survive."
Python3
>>> text = """Chief Cabinet Secretary Kato emphasized that he would make every effort to return to Japan as soon as possible by recording a radio program broadcast by the government and others to the abductees in North Korea, saying, "Hug each other with your family. Please continue to have a strong feeling that the day will come and survive. "
On the 16th, Secretary of State Kato, who also serves as the minister in charge of the abduction issue, is investigating the so-called specific disappearances who cannot be ruled out by the government and North Korea. We recorded a radio program that is being broadcast to the abductees in Japan.
In this, Chief Cabinet Secretary Kato said, "The abduction issue is regarded as the most important issue in the Suga Cabinet. I met with my family with Prime Minister Suga and shared the earnest desire to" get results at all costs. " ".
Then, "Break the shell of mutual distrust. Then," Break the shell of mutual distrust. Then "Break the shell of mutual distrust. Then" Break the shell of mutual distrust. Then "Break the shell of mutual distrust." Break the shell, then "break the shell of mutual distrust, and then" break the shell of mutual distrust, and then "phase....On the 16th, Secretary of State Kato, who also serves as the minister in charge of the abduction issue, is investigating the so-called specific disappearances who cannot be ruled out by the government and North Korea. We recorded a radio program that is being broadcast to the abductees in Japan.
...In this, Chief Cabinet Secretary Kato said, "The abduction issue is positioned as the most important issue in the Suga Cabinet. I met with my family with Prime Minister Suga and shared my earnest desire to produce results at all costs." I did. "
...Then, "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Keep a strong feeling of companionship, please take good care of your body and survive." I called."""
>>>
>>> text = text.replace("\n", "")
>>> text
'Chief Cabinet Secretary Kato emphasized that he would make every effort to return to Japan as soon as possible by recording a radio program broadcast by the government and others to the abductees in North Korea. Please continue to have a strong feeling that the day will come and survive. " On the 16th, Secretary of State Kato, who also serves as the minister in charge of the abduction issue, is investigating the so-called specific disappearances who cannot be ruled out by the government and North Korea. We recorded a radio program that is being broadcast to the abductees in Japan. In this, Chief Cabinet Secretary Kato said, "The abduction issue is regarded as the most important issue in the Suga Cabinet. I met with my family with Prime Minister Suga and shared the earnest desire to" get results at all costs. " ". Then, "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," "Breaking the shell of mutual distrust," After breaking the shell, "Breaking the shell of mutual distrust," Breaking the shell of mutual distrust, "Keep a strong feeling of mutuality, please take good care of your body and survive." I did.'
Python3
>>> import spacy
>>> from spacy.matcher import Matcher
>>> nlp = spacy.load('ja_ginza')
>>>
>>> tmp = ["Label name: {label}word: {word}".format(label=ent.label_, word= ent.text) for ent in doc.ents]
>>> tmp
['Label name:PERSON word:Chief Cabinet Secretary Kato', 'Label name:LOC word:North Korea', 'Label name:PERSON word:Chief Cabinet Secretary Kato', 'Label name:DATE word:16th', 'Label name:LOC word:North Korea', 'Label name:LOC word:North Korea', 'Label name:PERSON word:Chief Cabinet Secretary Kato', 'Label name:PERSON word:Suga', 'Label name:PERSON word:Suga']
>>>
>>> from pprint import pprint
>>> pprint(tmp)
['Label name:PERSON word:Chief Cabinet Secretary Kato',
'Label name:LOC word:North Korea',
'Label name:PERSON word:Chief Cabinet Secretary Kato',
'Label name:DATE word:16th',
'Label name:LOC word:North Korea',
'Label name:LOC word:North Korea',
'Label name:PERSON word:Chief Cabinet Secretary Kato',
'Label name:PERSON word:Suga',
'Label name:PERSON word:Suga']
>>>
Python3
>>> words_list = extract_words_by_entity_label(text, "aaa")
Its named entity label does not exist.
>>>
>>> print(words_list)
[]
>>>
>>> label = "LOC"
>>> words_list = extract_words_by_entity_label(text, label)
>>> print(words_list)
['North Korea', 'North Korea', 'North Korea']
>>>
>>> for label in ["LOC", "DATE", "PERSON"]:
... print(label, " : ", extract_words_by_entity_label(text, label))
...
LOC : ['North Korea', 'North Korea', 'North Korea']
DATE : ['16th']
PERSON : ['Chief Cabinet Secretary Kato', 'Chief Cabinet Secretary Kato', 'Chief Cabinet Secretary Kato', 'Suga', 'Suga']
>>>
__ It may be good to extract a word with a specific thing attribute (* Entity *) from the target text and then perform the following processing. __
-Geocoding tool that returns the facility type and address etc. when you enter the location name
Recommended Posts