[PYTHON] Let's have Aozora Bunko summarized while talking with COTOHA

This is Qiita's first post. Please forgive the points that do not reach. Promise a Bugfix.

Good articles have been mass-produced from the planning of COTOHA API, and I am enjoying it as a reader. When I thought about doing something myself, I wanted to summarize my literary work. Naturally, open source literature is limited, so I chose Aozora Bunko as my target.

In conclusion, the results are not academically meaningful. The conclusion is that I tried and errored the COTOHA API and got an interesting result in my own way. However, isn't the natural language processing API originally something that can be enjoyed in this way? Proposal.

Among the COTOHA APIs, I have high expectations for the summary API. In terms of compressing the amount of information, I think it will be a good auxiliary tool for busy modern people. It's like saying "God is dead!" When you give "Zoroaster talks". Well, I don't know what help our lives if we get the summary "God is dead!"

The environment is Colaboratory.

COTOHA API, or

COTOHA API

An API that allows you to easily try language processing! (In Qiita, explanation is no longer necessary, so I will omit it)

On the shoulders of giants

You can easily do big things by riding on the shoulders of Qiita's ancestors. Thank you very much.

I made an emotion radar chart of Aozora Bunko's work Copy and paste code to use Cotoha on Google colab very easily [First COTOHA API] I tried to summarize the old story

Below, I will continue to inherit my code, philosophy, passion and half-interesting feelings.

Standing on the shoulders of giants

The code I wrote this time just divides the text to be analyzed into a certain number (5000 characters) and passes it to the API. It is also regrettable to call it a stepladder. It's about a mountain in the sandbox. I think it is correct to use punctuation marks, but I am lacking in technical skills.

def summarize_by_cotoha(url):
  document=get_txt_from_aozorabunko(url)
  document_split=[document[i:i+5000] for i in range(0,len(document),5000)]
  document_summary=[]
  for sentense in document_split:
    document_summary.append((cotoha_call("summary",sentense))["result"])

  return document_summary

Let's summarize Aozora Bunko

I didn't have a literary background, so I tried it with the theme of "Dazai's human disqualification," which is the greatest common divisor if viewed favorably. I haven't read it. The title touches my heart for some reason.


#Human disqualification
ningenshikkaku=get_txt_from_aozorabunko("https://www.aozora.gr.jp/cards/000035/files/301_14912.html")
cotoha_call("summary",ningenshikkaku)

===>
Summary (β)
===>
<Error> BAD REQUEST

trial

What? What is BAD REQUEST? I heard that an error will occur if the request is too long, but is it BAD REQUEST?

I'm a little frustrated, so I'll ask COTOHA.


cotoha_call("summary",'''
I'm not sure what caused the error.
Is it a problem to be called BAD REQUEST?
Please do something!

''')

===>
Summary (β)
===>
{'result': 'I'm not sure what caused the error.', 'status': 0}

Apparently, the information other than "An error has occurred!" Was judged to be garbage.

Try to put in current affairs and explore the reaction



cotoha_call("summary",'''
I'm not sure what caused the error.
Is it a problem to be called BAD REQUEST?
Please do something!
It's nice weather today, is not it! It's the best in spring! I can't wait for the cherry blossoms, right? Isn't the coronavirus converged?

''')

===>
Summary (β)
===>
{'result': 'I'm not sure what caused the error.', 'status': 0}

It seems that it is only judged as a Kramer who complains about the error

Let's put in the only phrase that Natsume Soseki knows, or the phrase of Oda Nobunaga's death poem.


cotoha_call("summary",'''
I'm not sure what caused the error.
Is it a problem to be called BAD REQUEST?
Please do something!
It's nice weather today, is not it! It's the best in spring! I can't wait for the cherry blossoms, right? Isn't the coronavirus converged?

I am a cat.

50 years human
If you dive in the lower heavens
Like a dream
Received once
Should there be something that cannot be destroyed
''')

===>
Summary (β)
===>
{'result': 'If we dive in the heavens for 50 years, it will become like a dream, and should there be something that cannot be destroyed once?', 'status': 0}

It finally responded. Compared to Oda Nobunaga's death poem, my malicious complaint seems to be less than garbage

I finally decided to split the sentence and pass it to the API


ningen_summary=summarize_by_cotoha("https://www.aozora.gr.jp/cards/000035/files/301_14912.html")

['I have never seen a child with such a mysterious expression.',
 'Even the younger men were told by their mother how the speech was tonight, and they said it was very interesting.',
 'I played a clown, and the man never laughed at me, and I knew that if I played too much clown for a man, I would fail. So I always tried to round it up at the right place, but the woman didn't know that it was right, and she demanded a clown from herself forever, and she was exhausted in response to the endless encore. ..',
 'Only for Takeichi, I had been showing my perishable nerves for a long time, and I was able to show this self-portrait to Takeichi with peace of mind, and I was greatly praised, and two or three more ghost pictures. I continued to paint, and I got another prophecy from Takeichi, "You will be a great painting."',
 'But to me, it seemed obvious.',
 'It's scary that I'm still alone in that room of the boarding house, and I feel like I'm being attacked by someone and being hit by someone, and when I jump out into the city, it's a movement of Rei I helped or drank cheap sake with Horiki, abandoned most of my studies and studying painting, and entered high school in November of the second year, older than myself. I had a mournful death with my husband's lady, and my personal condition changed completely.',
 'Tsuneko is kissed now.',
 'But if you're feeling well, clearly defining your future plans, and consulting with me, I'll try to help you, at least in small increments, for your rehabilitation. I'm even thinking.',
 'I have never visited Horiki's house from myself, and I usually called Horiki to myself by telegram, but now even the telegram fee is lonely, and I fall into it. Horiki thought that he might not come just by sending a telegram from his body, so he decided to "visit", which he was not good at, sighed and took the streetcar. When I realized that the only rope I could rely on in the world was that Horiki, I was struck by a terrible sign that my spine became cold.',
 'Since then, I have had to scare even Shigeko.',
 '"Would you like to drink?"',
 'Trust is a sin.',
 'Yoshiko seems to think that she drank poison in place of Yoshiko, and even more so than before, she grated herself and didn't laugh at whatever she said. It was so unspeakable that I was annoyed to be in the apartment room, and I just went out and had to drink cheap sake as usual.',
 'That friend was a so-called alumni of my college days, and now I am a lecturer at a certain women's college, but in fact I asked this friend to marry my relatives, so that errand Because of this, I decided to buy some fresh seafood and feed it to my family, so I went to Funabashi City with a rucksack on my back.']


A person like me who has no literary talent can only understand it in 3 lines, so I will summarize it further.


ningen_summary_summary=cotoha_call("summary",''.join(ningen_summary))
ningen_summary_summary

===>
Summary (β)
===>
{'result': 'Yoshiko seems to think that she drank poison in place of Yoshiko, and even more so than before, she grated herself and didn't laugh at whatever she said. It was so unspeakable that I was annoyed to be in the apartment room, and I just went out and had to drink cheap sake as usual.',
 'status': 0}

In other words, human disqualification is ** Yoshiko was still going to drink cheap sake ** Seems to be COTOHA's summary. Who is Yoshiko-san? The image of the on-site cat is in my mind ...

Other works

##Naked lord (translated because the copyright notice was subtle)
hadaka_summary=summarize_by_cotoha("https://www.aozora.gr.jp/cards/000019/files/46319_23030.html")
hadaka_summary

["There are only two scammers"],["But the king is naked, right?"]

It's pretty core, but when I put these two on the summary API, the former won. It seems that fraudsters have a stronger impact than kings.


#Theory of relativity

soutaisei_summary=summarize_by_cotoha("https://www.aozora.gr.jp/cards/001428/files/50328_64360.html")
soutaisei_summary

['We can somehow call each stationary object with respect to ether an absolutely stationary object.',
 "This coordinate system K'Also place one clock system in that space and put them all in K'Let's work with this K'You can set a stationary clock with respect to exactly according to the rules given above.",
 'The speed at which the clock advances in a state of motion is determined by the movement clock that has just been considered among the clocks that are stationary with respect to K and measure the time with respect to K. It must be considered that it is always found in comparison with the pointer position of something that passes by.']

#Further summary
cotoha_call("summary",''.join(soutaisei_summary))

{'result': "This coordinate system K'Also place one clock system in that space and put them all in K'Let's work with this K'You can set a stationary clock with respect to exactly according to the rules given above.",
 'status': 0}

Hmmm, could it be summarized? I feel like I've heard COTOHA's voice saying, "Don't give something you don't understand." Thank you

Serpentine

Since it is troublesome to think about copyright, I tried to apply the recent minister's remarks to the summary API


cotoha_call("summary","When I thought about 30 years from now, I was wondering how old I would be 30 years from immediately after the disaster. That's why I think I'm a politician who has the potential to see the milestone of whether I can keep my promise 30 years from now if I stay healthy.")

{'result': 'When I thought about 30 years from now, I was wondering how old I would be 30 years from immediately after the disaster.', 'status': 0}

I heard Mr. COTOHA's voice asking if I could summarize something that has no point.

Summary

I also tried Hansel and Gretel and Little Red Riding Hood, but it wasn't good enough. Most of the items in Aozora Bunko have an old style, so they seem to be incompatible.

If you can specify one phrase for summarization, two phrases for summarization, etc., I feel that the application will expand further!

Glory to the future development of the COTOHA API!

Recommended Posts

Let's have Aozora Bunko summarized while talking with COTOHA
Have Aozora Bunko read slowly
Clustering books from Aozora Bunko with Doc2Vec
Have Aozora Bunko read slowly: More accurately
Have Aozora Bunko read slowly: Code improvement
Let's enjoy natural language processing with COTOHA API