In Japanese, there are three types of emotion value dictionaries that can be used to obtain negative and positive polarity values in sentiment analysis.
Word Emotion Polarity Correspondence Table
Japanese Evaluation Polar Dictionary
- Polar Phrase Dictionary
In this article, we will use the "Word Emotion Polarity Correspondence Table" in which all 55,125 words are registered. According to the above official website, the emotion polarity value was automatically calculated using the vocabulary network (author's note: a network showing the semantic relevance of words and phrases) using the "Iwanami Japanese Dictionary" as a resource. It is a real value of +1.

(1) Acquisition of "word emotion polarity value correspondence table"

1. Read "Word Emotion Polarity Correspondence Table"

import pandas as pd

pd.set_option('display.unicode.east_asian_width', True)

#Reading the emotion value dictionary
pndic = pd.read_csv(r"http://www.lr.pi.titech.ac.jp/~takamura/pubs/pn_ja.dic",
                    encoding="shift-jis",
                    names=['word_type_score'])
print(pndic)

set_option () of pandas specifies various options such as display format, and the argument 'display.unicode.east_asian_width' is displayed with the column name and value aligned in consideration of double-byte characters. I will.
The "Word Emotion Polarity Correspondence Table" is registered line by line in the format of "word (termination form): reading: part of speech: emotion value [-1, +1]", and the polarity value is close to -1. The closer it is to +1 the more negative it is, and the closer it is to +1 it is positive.

2. Extract only words and emotion values and convert to dict type

Use split () to divide the column into four with ":" as the delimiter, extract the word (termination form) " word " and the emotion value " score ", and convert them to dict type.

import numpy as np

#Extract word and emotion values
pndic["split"] = pndic["word_type_score"].str.split(":")
pndic["word"] = pndic["split"].str.get(0)
pndic["score"] = pndic["split"].str.get(3)

#Convert to dict type
keys = pndic['word'].tolist()
values = pndic['score'].tolist()
dic = dict(zip(keys, values))

print(dic)

⑵ Text to be analyzed

This time, we will ask NHK's news site for resources, and optionally take up a business-related article "Spaghetti imports record high" dated December 29, 2020.
Converts a line-by-line list using the comma "." As a delimiter.

text = 'The nationwide import volume of spaghetti reached a record high by October, and customs suspects that the background is the so-called "needing demand" that has increased due to the spread of the new coronavirus infection. According to Yokohama Customs, the amount of spaghetti imported from ports and airports nationwide was approximately 142,000 tons as of the end of October. This was a record high, exceeding the import volume of one year three years ago by about 4000 tons. In addition, macaroni also had an import volume of more than 11,000 tons by October, which is almost the same as the import volume of one year four years ago, which was the highest ever.'

lines = text.split("。")

(3) Converting text into data by morphological analysis

1. Generate an instance of morphological analysis by MeCab

Install MeCab.

!apt install aptitude
!aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
!pip install mecab-python3==0.7

Import MeCab and instantiate it with output mode " -Ochasen ".
As an example, the morphological analysis result on the first line is shown.

import MeCab
mecab = MeCab.Tagger("-Ochasen")

#Illustrate the results of morphological analysis on the first line
print(mecab.parse(lines[0]))

2. Data list with limited part of speech

We will narrow down the parts of speech that are considered to contribute to the positive/negative character of each sentence into four categories: nouns, adjectives, verbs, and adverbs.

#Extract words based on morphological analysis
word_list = []
for l in lines:
    temp = []
    for v in mecab.parse(l).splitlines():
        if len(v.split()) >= 3:
            if v.split()[3][:2] in ['noun','adjective','verb','adverb']:
                temp.append(v.split()[2])
    word_list.append(temp)

#Remove empty element
word_list = [x for x in word_list if x != []]

print(word_list)

This is a list of sentence units, and the emotion polarity value that judges positive/negative is acquired from the "word emotion polarity value correspondence table".

⑷ Acquisition of emotional polarity value

1. Get the emotion polarity value from the dictionary

For each sentence, get the emotion polarity value for each word and output it to the data frame.

result = []
#Sentence-based processing
for sentence in word_list:
    temp = []
    #Word-based processing
    for word in sentence:
        word_score = []
        score = dic.get(word)
        word_score = (word, score)
        temp.append(word_score)       
    result.append(temp)

#Display as a data frame for each sentence
for i in range(len(result)):
    print(lines[i], '\n', pd.DataFrame(result[i], columns=["word", "score"]), '\n')

The results of all 4 lines are summarized in the table below. From left to right, the words on each line and their emotional polarity values.
First of all, None is a word that is not registered in the "Word Emotion Polarity Correspondence Table", and it can be said that it is a problem that always occurs even with a dictionary with a large number of recorded vocabularies.
However, the problem is that there is only one word that has a positive emotional polarity value, "the most", and all others have negative values. Many of them do not realize why they are judged to be negative.

2. Calculate the average emotional polarity value for each sentence

#Calculate the average value for each sentence
mean_list = []
for i in result:
    temp = []
    for j in i:
        if not j[1] == None:
            temp.append(float(j[1]))
    mean = (sum(temp) / len(temp))
    mean_list.append(mean)

#Display as a data frame
print(pd.DataFrame(mean_list, columns=["mean"], index=lines[0:4]))

Since the value is almost negative in general, the smaller the number of words, the higher the average value, and the smaller the degree of negativeness on the third line, which is the shortest sentence.
Here, let's consider the "word emotion polarity value correspondence table" again.

⑸ Reconsider "Word emotion polarity value correspondence table"

1. Check the composition ratio of negative and positive

What is the ratio of positive/negative and neutral in the "word emotion polarity value correspondence table" in the first place?

#Number of positive words
keys_pos = [k for k, v in dic.items() if float(v) > 0]
cnt_pos = len(keys_pos)
#Number of negative words
keys_neg = [k for k, v in dic.items() if float(v) < 0]
cnt_neg = len(keys_neg)
#Neutral word count
keys_neu = [k for k, v in dic.items() if float(v) == 0]
cnt_neu = len(keys_neu)

print("Percentage of positives:", ('{:.3f}'.format(cnt_pos / len(dic))), "(", cnt_pos, "word)")
print("Percentage of negatives:", ('{:.3f}'.format(cnt_neg / len(dic))), "(", cnt_neg, "word)")
print("Neutral percentage:", ('{:.3f}'.format(cnt_neu / len(dic))), "(", cnt_neu, "word)")

Negatives account for 90% of the total, and it can be seen that they are significantly biased. Even if there are 10 words, there is only one positive word, so the average value for each sentence is generally negative.
For example, "Spaghetti: -0.2070203" and "Macaroni: -0.384254" were mentioned earlier, but the actual feeling of negative and positive should be different depending on each person's taste and situation.
In addition, although it is the only positive word in the previous result, "most", even if you look at the recent usage such as "newly infected people update the most", it is not necessarily a word used only for positive meaning. it is clear.

2. Check for duplicate words

The number of registrations is 55,125 words, but the total number has been reduced to 52,671 words when converted to dict type.

print("Number of elements before conversion to dict type:", len(pndic))
print("Number of elements after conversion to dict type:", len(dic), "\n")

pndic_list = pndic["word"].tolist()
print("Unique number of elements before conversion to dict type:", len(set(pndic_list)))

As a general rule, the registration format is word (termination form): reading: part of speech: emotion value [-1, +1], but for example, the same word (termination form) can be used as follows. Multiple data are scattered. Counting the unique number of elements without such duplication matches the number of elements converted to dict type 52,671 as described above.

By the way, sentiment analysis requires processing such as reversal of polarity by denial such as "-not" and "but" and amplification of emotional intensity such as "very" and "very". Not only the problem of the dictionary itself, but the above processing is not enough.
Making a dictionary costs a lot of energy and a lot of money. With such up-front investment, I feel a little reluctant to criticize the degree of perfection in an attempt to benefit from the ready-made ones. From my personal experience of using human resources to process natural language, I would like to think about sentiment analysis in Japanese with a sense of ownership.
3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]

3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]

(1) Acquisition of "word emotion polarity value correspondence table"

1. Read "Word Emotion Polarity Correspondence Table"

2. Extract only words and emotion values ​​and convert to dict type

⑵ Text to be analyzed

(3) Converting text into data by morphological analysis

1. Generate an instance of morphological analysis by MeCab

2. Data list with limited part of speech

⑷ Acquisition of emotional polarity value

1. Get the emotion polarity value from the dictionary

2. Calculate the average emotional polarity value for each sentence

⑸ Reconsider "Word emotion polarity value correspondence table"

1. Check the composition ratio of negative and positive

2. Check for duplicate words

2. Extract only words and emotion values and convert to dict type