[PYTHON] Using COTOHA, I tried to follow the emotional course of Run, Melos!

Tyrant Dionis: Yeah, what happened?

Meros: It's not happening!

I would like to read about "Run, Melos!" Using the sentiment analysis of COTOHA.

Process flow

  1. Obtained the full text of "Run, Melos!" From Aozora Bunko
  2. Run the full text of Hashire Melos to COTOHA (sentiment analysis).
  3. Graph and analyze emotional values.

Advance preparation

Clone Aozora Bunko


!git clone --branch master --depth 1 https://github.com/aozorabunko/aozorabunko.git

code

1. Get the full text of "Run, Melos!" From Aozora Bunko

Obtained the full text of "Run Meros" from Aozora Bunko


from bs4 import BeautifulSoup

def get_word():

  #Specify the path from the cloned html(Run, Melos!)
  path_to_html='aozorabunko/cards/000035/files/1567_14913.html'
  
  #HTML parsing with BeautifulSoup
  with open(path_to_html, 'rb') as html:
    soup = BeautifulSoup(html, 'lxml')
  main_text = soup.find("div", class_='main_text')
  for yomigana in main_text.find_all(["rp","h4","rt"]):
    yomigana.decompose()
  sentences = [line.strip() for line in main_text.text.strip().splitlines()]
  aozora_text=','.join(sentences)

  aozora_text_list = aozora_text.split("。")
  return aozora_text_list
Obtained the full text of "Run Meros" from the locally cloned Aozora Bunko. Also, as it is, the API cannot be used (the number of characters is large). It is divided and arranged for each punctuation mark.
2. Run The full text of Melos is applied to COTOHA (emotion analysis).

Run the full text of Hashire Melos, COTOHA(Sentiment analysis)I will go to.


mport os
import urllib.request
import json
import configparser
import codecs
import sys
import time


client_id = "Your client ID"
client_secret = "Your own secret key"

developer_api_base_url = "https://api.ce-cotoha.com/api/dev/nlp/"
access_token_publish_url = "https://api.ce-cotoha.com/v1/oauth/accesstokens"

def cotoha_call(sentence_1,api_type):
    #Get access token
    def getAccessToken():     
        url = access_token_publish_url
        headers={
            "Content-Type": "application/json;charset=UTF-8"
        }
        data = {
            "grantType": "client_credentials",
            "clientId": client_id,
            "clientSecret": client_secret
        }
        data = json.dumps(data).encode()
        req = urllib.request.Request(url, data, headers)
        res = urllib.request.urlopen(req)
        res_body = res.read()
        res_body = json.loads(res_body)
        access_token = res_body["access_token"]
        return access_token

    #API URL specification(Named entity recognition)
    base_url_footer = "v1/"  + api_type
    url = developer_api_base_url + base_url_footer
    headers={
        "Authorization": "Bearer " + getAccessToken(), #access_token,
        "Content-Type": "application/json;charset=UTF-8",
    }
    data = {
        "sentence": sentence_1
    }
    data = json.dumps(data).encode()
    time.sleep(0.5)
    req = urllib.request.Request(url, data, headers)
        
    try:
        res = urllib.request.urlopen(req)
    #What to do if an error occurs in the request
    except urllib.request.HTTPError as e:
        #If the status code is 401 Unauthorized or 500 Internal Server Error, reacquire the access token and request again.
        if e.code == 401 or 500:
            access_token = getAccessToken()
            headers["Authorization"] = "Bearer " + access_token
            time.sleep(0.5)
            req = urllib.request.Request(url, data, headers)
            res = urllib.request.urlopen(req)
        #Show cause for errors other than 401 or 500
        else:
            print ("<Error> " + e.reason)
            #sys.exit()

    res_body = res.read()
    res_body = json.loads(res_body)
    return res_body


sentence = get_word()
end = len(a)
full_text = len(''.join(sentence))
sentiment_list = []
cnt = 0
wcnt = 0
for i in a:
  cnt+=1
  print(str(cnt)+ "/" + str(end))
  b = cotoha_call(i,"sentiment")
  sentiment_list.append([b['result']['sentiment'],b['result']['score'],i,wcnt])
  wcnt+=len(i)

The part where the entire array of "Hashire Melos" is applied to COTOHA (sentiment analysis). By the way, if you divide the whole sentence of "Hashire Melos" with punctuation marks, it will be divided into 460 pieces. That is, it calls the API 460 times. (* COTOHA is free for each API 1000 calls per day.)
3. Graph the numbers.

Graph the numbers.


#Library used for numerical calculation
import numpy as np
import pandas as pd
import seaborn as sns

#Library for drawing graphs
from matplotlib import pyplot as plt

#Specification for displaying the graph in colab
%matplotlib inline

#Adjust values for graphs
graph_x = []
graph_y = []
for i in sentiment_list:
  graph_x.append(i[3])
  if(i[0] == 'Negative'):
    tmp_y = (i[1] * -1) + (-1)
    graph_y.append(tmp_y)

  if(i[0] == 'Positive'):
    tmp_y = i[1]+1
    graph_y.append(tmp_y)
  
  if(i[0] == 'Neutral'):
    graph_y.append(i[1])

sns.set()

#plt.plot(x, y, color = 'black')
plt.figure(figsize=(40, 8), dpi=100)
plt.plot(graph_x, graph_y, color = 'red', marker = 'o', linestyle = '-', label = 'Sensor1')
#plt.xlim([0,5000])
plt.title("Run, Melos,")
plt.xlabel("Sentence")
plt.ylabel("Emotion")
I use matplotlib to make a line graph of sentiment analysis results. Also, if you just apply COTOHA, it will only be classified as neutral, negative, positive Since all the values are in the same range of 0 to 1, we set the offset for the graph. +1 if positive If negative, set the value to minus and + (-1) If it is neutral, it does nothing.

Output result

Line graph meros_graph.png

The vertical axis is emotion and the horizontal axis is the text position of each phrase. There is quite a range of emotional fluctuations.

Positive side analysis

Top positive phrases No1 ~ 3


[1.8126587191707609, 'I also bought a beautiful costume']
[1.6346889479386029, 'Your life is important now'],
[1.633215237292569, 'The young mason also screamed, running after Melos.']

The most positive phrase is ** "I bought a beautiful costume" **, with a big difference in scores between 2nd and 3rd place.

This is one phrase in the conversation between "Meros" and "Sister" when "Meros" returned to the village. I think I was very happy that I was able to run all the way from the village to the city and hand over the costumes I bought to my sister. (It's the original purpose of Meros in the first place.)

Sentences before and after the phrase that seems to be the most positive(reference)


I was surprised to find the exhausted appearance of my brother who staggered and walked. Then I asked my noisy brother a question.
"Nothing." Melos tried to force him to laugh. "I've left an errand in the city. I have to go to the city again soon. Tomorrow, I'll have your wedding. It's better to be early."
My sister raised her cheeks.
"I'm happy. I bought a beautiful costume. Now, let's go and let the villagers know. The wedding is tomorrow."
Melos also staggered, returned home to decorate the altar of the gods, set up a feast, and soon fell to the floor and fell into a deep sleep that could not breathe.

Negative side analysis

Top No. 1 to 3 of the most negative phrases


[-1.9459608426670426, 'Don't let the man die']
[-1.939408157054805, 'Even if it's a moment, it can't be wasted']
[-1.8666593927565555, 'Oh, the sun sets']

The 1st and 2nd places don't look like negative phrases, so they don't look right. ("Die", "Waste" This word seems to be the cause.)

So, isn't the practically most negative phrase ** "Oh, the sun sets" **?

It's a sentence while I'm returning to the city with all my strength by the time limit to release my friend Serinuntius who is hostage.

Multiple difficulties stand in front of Meros, but we will break through them Even though I finally kneel once, various thoughts go around and in the depiction of standing up again It is a depiction of the part that time is passing every moment while doing this.

Sentences before and after the phrase that seems to be the most negative(reference)


Run! Meros.
I am trusted. I am trusted. That devil's whisper earlier was a dream. It's a bad dream. Forget it.
When the five organs are tired, they have such a bad dream. Meros, it's not your shame.
After all, you are a true hero. Isn't it possible to stand and run again? Thank you!
I can die as a justice fighter. Oh, the sun goes down. It sinks steadily. Wait, Zeus.
I have been an honest man since I was born. Leave it as an honest man and let him die.

If you analyze the emotional value in blocks ...

The negative and positive emotional values, the parts where each is organized, are shown below in blocks.

Scatter plot display


#plt.plot(graph_x, graph_y, color = 'red', marker = 'o', linestyle = '-', label = 'Sensor1')
plt.scatter(graph_x, graph_y, color = 'red', marker = 'o', linestyle = '-', label = 'Sensor1')
Scatter plot

meros_scat.png

Blocks with the most positive values in a row (the part marked with an orange circle)

python


Hope to fulfill the obligation.
Hope to kill myself and protect my honor.
The sunshine casts red light on the leaves of the trees, and the leaves and branches are shining as they burn.
There is still time before sunset. Someone is waiting for me.
There are people who are quietly expecting me without any doubt.
I am believed. My mystery of life doesn't matter. I can't say anything nice about being dead and apologizing.
I have to reward my trust. That's just one thing now.
Run! Meros.
I am trusted. I am trusted. That devil's whisper earlier was a dream. It's a bad dream. Forget it.
When the five organs are tired, they have such a bad dream. Meros, it's not your shame.
After all, you are a true hero. Isn't it possible to stand and run again?
Thank you! I can die as a justice fighter.

It's a depiction of Melos, who once knelt down, when he stands up again. You can read the heightened emotions. (By the way, it is the sentence immediately before the most negative phrase described above.)

Blocks with consecutive most negative values (marked in blue)

python


If that happens, I'll be harder than dying. I'm a traitor forever. The most disgraceful race on earth.
Serinuntius, I'll die too. Let me die with you. Only you must believe me.
No, is that also my selfishness? Ah, let's survive as a villain.
My house is in the village. There are also sheep. My sister and her husband would never drive me out of the village.

It's a depiction when trying to give up helping a friend Seriuntius. Certainly, it's thrown away. (After that, I throw out my limbs, and when I finally get sick, I get drowsy.)

Summary

~~ From the positive emotional value, I thought that Melos was a sister complex. ~~

It's easy, but I tried to understand "Run, Melos!" Using sentiment analysis. It's quite interesting to read the sentences like this.

that's all. Thank you for reading this far.

Recommended Posts

Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to extract and illustrate the stage of the story using COTOHA
[Python] I tried to visualize the follow relationship of Twitter
I tried to touch the COTOHA API
I tried to get the index of the list using the enumerate function
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to get the batting results of Hachinai using image processing
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
[Python] I tried to judge the member image of the idol group using Keras
I tried to summarize the basic form of GPLVM
I tried to approximate the sin function using chainer
I tried using the API of the salmon data project
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to classify the voices of voice actors
I tried to compress the image using machine learning
I tried to summarize the string operations of Python
I tried to predict the victory or defeat of the Premier League using the Qore SDK
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to extract the text in the image file using Tesseract of the OCR engine
[Horse Racing] I tried to quantify the strength of racehorses
[First COTOHA API] I tried to summarize the old story
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried to simulate ad optimization using the bandit algorithm.
I tried face recognition of the laughter problem using Keras.
[Python] I tried collecting data using the API of wikipedia
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to approximate the sin function using chainer (re-challenge)
I tried to output the access log to the server using Node.js
I wanted to run the motor with Raspberry Pi, so I tried using Waveshare's Motor Driver Board
I tried using GrabCut of OpenCV
I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition
Implementation of recommendation system ~ I tried to find the similarity from the outline of the movie using TF-IDF ~
I tried to move the ball
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
I tried using the checkio API
I tried to estimate the interval.
[Linux] I tried to summarize the command of resource confirmation system
I tried to get a database of horse racing using Pandas
I tried to automate the watering of the planter with Raspberry Pi
I tried to make a regular expression of "time" using Python
I tried to build the SD boot image of LicheePi Nano
I wanted to challenge the classification of CIFAR-10 using Chainer's trainer
I tried to make a regular expression of "date" using Python
I tried using the COTOHA API (there is code on GitHub)
I looked at the meta information of BigQuery & tried using it
I tried to digitize the stamp stamped on paper using OpenCV
I tried to get a list of AMI Names using Boto3
I tried to expand the size of the logical volume with LVM