[PYTHON] Quantify "Don't use too strong words" (Sentiment analysis starting with BLEACH)

** "... Don't use too strong words" ** ** "Looks weak" ** (From Bleach Volume 20)

It was a famous scene that strongly imprinted Aizen's dignity on the reader. The day when it will be registered in Kojien as a Japanese idiom may be near (a lie).

Now, what is worrisome here is the definition of "strong words". Which line should be crossed to be a "strong word"?

Do you think "word strength is subjective and cannot be quantified"? One method is to use sentiment analysis to calculate "word strength" as a magnitude score, or quantitative value.

So, this time, we will actually use Google Cloud API Natural Language in Python to perform sentiment analysis on BLEACH quotes.

environment

・ Google Colaboratory (used on Google Chrome)

What is natural language processing?

Natural language processing means letting your computer process natural language. What is "natural language" is "the language that humans usually use". In contrast to "a programming language that is an artificially created language," it is called a natural language in the sense that it is a naturally occurring language. Examples of artificial languages) Python, JavaScript, Ruby, etc. Examples of natural languages) Japanese, English, Chinese, etc.

The language that humans use in their daily lives is ambiguous. Computers, on the other hand, must command in a language that can be uniquely interpreted, such as a programming language. Computers are not good at handling human language.

What is sentiment analysis?

Sentiment analysis is one of the typical techniques of natural language processing. Literally, the emotions of text data are analyzed using programming. For sentiment analysis

Compare with the polarity dictionary
Build a machine learning model
Use API There are three methods, but since it is relatively easy to implement and high accuracy can be expected, this time we will do it in "3. Using API". There are various options for sentiment analysis API, but this time we will use Google Cloud API Natural Language (hereinafter referred to as Google NLP).

I tried to implement

This time, we will implement it in an online environment called Google Colaboratory.

In fact, let's see if we can use the Google Cloud API to sentiment-analyze text.


import requests 
APIkey = "XXXXXXXXX"
text = """
Don't use too strong words, it looks weak
"""
url = 'https://language.googleapis.com/v1/documents:analyzeSentiment?key=' + APIkey

header = {'Content-Type': 'application/json'}
body = {
    "document": {
        "type": "PLAIN_TEXT",
        "language": "EN",
        "content": text
    },
    "encodingType": "UTF8"
}
 
response = requests.post(url, headers=header, json=body).json()
 
print("Comprehensive magnitude:",response["documentSentiment"]["magnitude"])
print("Comprehensive score:",response["documentSentiment"]["score"])

When I ran the above code, I got the following output! It seems that I was able to analyze my emotions safely.

Overall magnitude: 0.2 Overall score: -0.2 Don't use too strong words. It looks weak. Magnitude: 0.2, score: -0.2

magunitude? score? There are some unfamiliar terms. First, score indicates whether the text is negative or positive with a value between -1.0 and 1.0 (formally called "polarity"). Next, magnitude expresses the emotional weight of the text as a value from 0 to ∞. As an image, think of score as the vector of emotions and magnitude as the absolute value of emotions.

Now, let's look at the output result again. It's a little negative-neutral, and it doesn't seem to be a strong word.

Finally, let's apply sentiment analysis to Hitsugaya-kun's lines, which Aizen described as "strong words." text = Run the code as "Aizen, I kill Teme".

Overall magnitude: 0.1 Overall score: 0.1 Aizen, I kill Teme. magnitude: 0.1, score: 0.1

Ah, the magnitude is lower than before. ** Actually, it was Aizen-sama who used strong words **.

It seems that the person noun "Aizen" has become an extra noise. Remove "Aizen," and try again.

Overall magnitude: 0 Overall score: 0 I kill Teme magnitude: 0, score: 0

I'm not getting the results I want.

The possible causes are as follows. ――The personal noun "Aizen" has become an extra noise. ――It seems that the text is too short to analyze well. ――It is difficult to obtain accuracy in Japanese (Japanese natural language processing is difficult. On the contrary, English natural language processing is the most advanced research)

After a lot of trial and error, the accuracy improved. In particular, "translating the text into English and then analyzing the emotions" was the most effective.

The conclusion for the time being is probably ** magnitude 0.7 or higher and "strong words" **. When confronting your nemesis and rivals in the future, please try to choose a line with a magnitude of 0.7 or less. You can avoid the death flag. In the future, I would like to investigate the correlation between BLEACH's victory and defeat and sentiment analysis.

reference

Python 1st grade experience and understand! Learn in conversation! How programming works Best as the first book for beginners.

Introduction to Natural Language Processing by Machine Learning / Deep Learning It is assumed that you understand the basics of Python to some extent, but it is most recommended for Python x natural language processing.

Bleach Color Edition (Kindle)

In addition, BURN THE WITCH, who shares the world view with Bleach, is a long-awaited new work by Professor Kubo Tite. All episodes are available for free on Prime Video!