- Although it is an emotion value analysis tool for English, VADER (Valence Aware Dictionary and sEntiment Reasoner) is a customized model for social media and is provided in the NLTK package. ..
- Often the text contains multiple emotions, but VADER evaluates ** emotional intensity ** as well as ** emotional bipolar (positive/negative) **, and ** negative. It also supports the modifier ** ("not") and its ** abbreviation ** ("n't").
- It also supports ** slang ** such as "kind of", which is the emoticon "kinda", and "sux" when you say "site" in Japanese, or as an indication of emotional strength other than words ** Exclamation marks ** and ** emoticons ** are also recognized and reflected in the emotion value score.
⑴ Usage / Overview of VADER Sentiment Analysis Tool
- VADER is a combination of ** "dictionary" and "rule" to find emotional values **, so no learning is required.
- First, ➊ Download the VADER dictionary, ➋ Import the VADER emotion analysis class
SentimentIntensityAnalyzer
from nltk, and ➌ Create an instance.
import nltk
#➊ Download dictionary
nltk.download('vader_lexicon')
#➋ Import emotion intensity analysis class
from nltk.sentiment.vader import SentimentIntensityAnalyzer
#➌ Instance generation
vader_analyzer = SentimentIntensityAnalyzer()
- 7,520 words are registered in the VADER dictionary vader_lexicon.txt.
- Vocabulary is collected from various language resources, the degree of positive/negative is manually evaluated [-4, +4], and words whose negatives and positives are easily changed depending on the context are eliminated.
- In other words, a dictionary is a semantic definition of the degree of positive/negative of a vocabulary, in other words, it is just a word with a shade of negative and positive.
- Therefore, as an indication of emotional strength, exclamation marks! And their numbers, emphatic expressions in which words are written in all capital letters, adverbs meaning degree (eg. "Partly" = some, "extreamy" = very), etc. And correct the value according to ** rules such as reversing the polarity after "but".
⑵ Emotional polarity output format
- For the instance
vader_analyzer
, the method polarity_scores ()
returns the emotional intensity as a dict based on the text entered. Positive values are positive and negative values are negative.
text = "I am happy."
result = vader_analyzer.polarity_scores(text)
print(text + "\n", result)
- For negative'neg', neutral'neu', and positive'pos' scores, the sentence "I am happy." Indicates the percentage of each category. In other words, it is judged as positive 78.7%, neutral 21.3%, negative 0%, and the total score of the three categories is 1.
- Compound score'compound' is the sum of the scores of all words normalized between [-1, +1].
⑶ Emotional polarity scoring
- As a very simple emotional expression, let's look at the judgment results for "I am sad." And "I am angry." In addition to "I am happy."
import pandas as pd
sentences = ["I am happy.", "I am sad.", "I am angry."]
#Get score
result = []
for s in sentences:
score = vader_analyzer.polarity_scores(s)
result.append(score)
#Convert from dictionary type to data frame
i = 0
df = pd.DataFrame()
for i in range(3):
x = pd.DataFrame.from_dict(result[i], orient='index').T
df = pd.concat([df,x], ignore_index=True)
df.index = sentences
print(df)
- It is clear that both "I am sad." And "I am angry." Are negative, but "angry" is more negative than introverted "sad" as a more positive emotional state. It is judged highly.
- There are also some English translations of "it's wonderful" in Japanese ...
sentences = ["That's fantastic.", "That's wonderful.", "That's great."]
- You can see the subtle difference in temperature difference, but I see, "fantastic" and "wonderful" have a slightly third-party feeling, but it feels more like "great". Can't you?
⑷ Negative / abbreviated and compound sentences
- Let's look at the negative form "not" and its abbreviation "n't", and how to judge when there is a contradictory meaning in a sentence.
sentences = ["I was not happy.", "I wasn't happy.", "I'm rich but unhappy."]
result = []
for s in sentences:
score = vader_analyzer.polarity_scores(s)
result.append(score)
i = 0
df = pd.DataFrame()
for i in range(3):
x = pd.DataFrame.from_dict(result[i], orient='index').T
df = pd.concat([df,x], ignore_index=True)
df.index = sentences
print(df)
- "I was not happy." And "I wasn't happy." Are exactly the same in the sense that "I was not happy." And the composite scores are also the same. However, looking at the percentages of the three categories, the abbreviation "I wasn't happy." Has a higher percentage of negatives. I guess that "I wasn't happy." Is more colloquial and more straightforward than "I was not happy.", And that's why the emotional strength is a little stronger.
- Also, "I'm rich but unhappy." That is, "rich but unhappy", although there is a positive side of "rich", the emotional polarity turns negative with "but" and negative with "unhappy". I conclude that. The score also has the highest percentage of negatives, the positives are counteracted and the composite score is negative.
⑸ Exclamation mark ・ Strength expression by capital letters
- A direct and idiomatic style that expresses your desire. Let's take a look at the exclamation mark! And its number, and the words written in all capital letters, how they are judged as emotional intensity.
sentences = ["I am happy.", "I am happy!", "I am happy!!", "I am happy!!!", "I am HAPPY."]
result = []
for s in sentences:
score = vader_analyzer.polarity_scores(s)
result.append(score)
i = 0
df = pd.DataFrame()
for i in range(5):
x = pd.DataFrame.from_dict(result[i], orient='index').T
df = pd.concat([df,x], ignore_index=True)
df.index = sentences
print(df)
- As the number of exclamation marks increases, the positive emotional intensity increases and the composite score also increases, but let's try to visualize it.
import matplotlib.pyplot as plt
from matplotlib import pyplot as pyplot
x = (df.index.values.tolist())[0:4]
y = df.iloc[0:4, 3]
plt.plot(x, y, marker="o")
plt.axhline(y=df.iloc[4, 3], color='r', linestyle='-')
plt.ylabel("compound score")
plt.legend(['number of "!"','uppercase notation'])
plt.grid()
- The red horizontal line shows the score when "HAPPY" is written in capital letters, but it is an intermediate level between !! and !!!. When it comes to that, it seems that the scheme "It's great !!" <"It's GREAT." <"It's great !!!" can be considered, but if you replace it with Japanese, "It's amazing !!" <"It's wonderful" < Is it something like "It's amazing !!!"
⑹ Emotional expression by emoticons
- Emoticons are called "emoticons" (emotion + icon) in Europe and the United States, and they look quite different from those in Japan.
- In Japan, emotions are expressed mainly with the eyes with the front face, while in Europe and the United States, the face is tilted 90 ° to the left and emotions are expressed mainly with the part that hits the mouth. For example, smiles are ":-)" and ":)", surprises are ": -o" and ": O", and sadness and tears are ":-(" and ": (".
- Is it only me who feels that the Western style is a little symbolic and the Japanese style is more pictorial and expressive?
sentences = ["I love you.", "I love you :-*","I love you <3"]
result = []
for s in sentences:
score = vader_analyzer.polarity_scores(s)
result.append(score)
i = 0
df = pd.DataFrame()
for i in range(3):
x = pd.DataFrame.from_dict(result[i], orient='index').T
df = pd.concat([df,x], ignore_index=True)
df.index = sentences
print(df)
- The "<3" in "I love you <3", which has the highest positive emotional intensity, represents a heart symbol, but it is tilted 90 ° to the right for the convenience of using the number 3. Also, "I love you:-*" is an emoticon that means "kiss".
- Next, let's look at the emotional intensity when using emoticons that represent smiles.
sentences = ["I am happy.", "I am happy :-)", "I am happy (^^)"]
- When ":-)" is entered, the positive emotional intensity increases considerably. On the other hand, if you use Japanese emoticons, the intensity of the positive is low, but the combined score is the same as "I am happy."
- As mentioned above, the English texts on SNS can be easily analyzed, and the judgment results are almost nod, which is quite practical.
- Next, I would like to look at the emotional value analysis in Japanese.