3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]

(1) Acquisition and preprocessing of "Japanese Evaluation Polar Dictionary (Terms)"

1. Load dictionary data onto Colab

import pandas as pd

#Control the misalignment of column names and values ​​in consideration of double-byte characters
pd.set_option("display.unicode.east_asian_width", True)

#Reading the emotion value dictionary
pndic_1 = pd.read_csv(r"http://www.cl.ecei.tohoku.ac.jp/resources/sent_lex/wago.121808.pn",
                      names=["judge_type_word"])

print(pndic_1)

image.png

2. Expand the data frame with a delimiter

pndic_2 = pndic_1["judge_type_word"].str.split('\t', expand=True)
print(pndic_2)

image.png

3. Delete unnecessary parts of emotion value positive/negative

judge = pd.Series(pndic_2[0])
judge.value_counts()

image.png

pndic_2[0] = pndic_2[0].str.replace(r"\(.*\)", "", regex=True)
print(pndic_2)

image.png

4. Confirm the registered contents of the wording part

df_temp = pndic_2[1].str.split(" ", expand=True)
print(df_temp)

image.png

df_temp.info()

image.png

v = pd.Series(df_temp[7])
v.value_counts()

image.png

print(df_temp[df_temp[7] == "painful"])
print(df_temp[df_temp[7] == "Nu"])
print(df_temp[df_temp[7] == "Is"])

image.png

5. Screening of dictionary data

pndic_3 = pd.concat([df_temp, pndic_2[0]], axis=1)
print(pndic_3)

image.png

pndic_4 = pndic_3[pndic_3[3].isnull()]
print(pndic_4)

image.png

pndic_5 = pndic_4[0]
print(pndic_5)

image.png

pndic_6 = pndic_5.drop_duplicates(keep='first')
pndic_6.columns = ["word", "judge"]
print(pndic_6)

image.png

6. Convert dictionary data to dict type

pndic_6["judge"] = pndic_6["judge"].replace({"Positive":1, "negative":-1})
print(pndic_6)

image.png

import numpy as np

keys = pndic_6["word"].tolist()
values = pndic_6["judge"].tolist()
dic = dict(zip(keys, values))

print(dic)

image.png

⑵ Preprocessing of the text to be analyzed

1. Specify the text

text = 'The nationwide import volume of spaghetti reached a record high by October, and customs suspects that the background is the so-called "needing demand" that has increased due to the spread of the new coronavirus infection. According to Yokohama Customs, the amount of spaghetti imported from ports and airports nationwide was approximately 142,000 tons as of the end of October. This was a record high, exceeding the import volume of one year three years ago by about 4000 tons. In addition, macaroni also had an import volume of more than 11,000 tons by October, which is almost the same as the import volume of one year four years ago, which was the highest ever.'

lines = text.split("。")

image.png

2. Create an instance of morphological analysis

!apt install aptitude
!aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
!pip install mecab-python3==0.7
import MeCab
mecab = MeCab.Tagger('-Ochasen')

#Illustrate the results of morphological analysis on the first line
print(mecab.parse(lines[0]))

image.png

3. List by sentence based on morphological analysis

#Extract words based on morphological analysis
word_list = []
for l in lines:
    temp = []
    for v in mecab.parse(l).splitlines():
        if len(v.split()) >= 3:
            if v.split()[3][:2] in ['noun','adjective','verb','adverb']:
                temp.append(v.split()[2])
    word_list.append(temp)

#Remove empty element
word_list = [x for x in word_list if x != []]

image.png

(3) Positive / negative judgment of sentences based on emotional polarity value

result = []
#Sentence-based processing
for sentence in word_list:
    temp = []
    #Word-based processing
    for word in sentence:
        word_score = []
        score = dic.get(word)
        word_score = (word, score)
        temp.append(word_score)       
    result.append(temp)

#Display as a data frame for each sentence
for i in range(len(result)):
    df_ = pd.DataFrame(result[i], columns=["word", "score"])
    print(lines[i], '\n', df_.where(df_.notnull(), None), '\n')

image.png

image.png

⑷ Positive/negative composition ratio of "Japanese Evaluation Polar Dictionary (Words)"

#Number of positive words
keys_pos = [k for k, v in dic.items() if float(v) == 1]
cnt_pos = len(keys_pos)
#Number of negative words
keys_neg = [k for k, v in dic.items() if float(v) == -1]
cnt_neg = len(keys_neg)

print("Percentage of positives:", ('{:.3f}'.format(cnt_pos / len(dic))), "(", cnt_pos, "word)")
print("Percentage of negatives:", ('{:.3f}'.format(cnt_neg / len(dic))), "(", cnt_neg, "word)")

image.png

⑸ Requirements for sentiment value dictionary and sentiment analysis tool [Discussion]

supported language name Type Number of recorded words Contents recorded Emotional polarity value Notices
English AFFIN-111 Emotion value dictionary 2,477 Nouns, adjectives, verbs, adverbs, exclamations, etc. INTEGER [-4, +4] Positive:negative= 3.5 : 6.5, neutral 1 word only
English VADER Emotion value analysis tool 7,520 Nouns, adjectives, verbs, adverbs, exclamations, slang, emoticons, etc. FLOAT [-1, +1] Inversion of polarity by negation, amplification processing of polarity value
Japanese Word emotion polarity value correspondence table Emotion value dictionary 55,125 Nouns, verbs, adverbs, adjectives, auxiliary verbs FLOAT [-1, +1] Positive:negative= 1 :Significant negative bias at 9
Japanese Japanese評価極性辞書(noun編) Emotion value dictionary 13,314 noun STRING [p, n, e] Positive/Negative ratio almost equilibrium, including very few overlaps
Japanese Japanese評価極性辞書(Words編) Emotion value dictionary 5,280 Words STRING [Positive,negative] Overall 3/About 4 is "stem"+Has the form of "conjugated words"

Recommended Posts

3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]
3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]
3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]
3. Natural language processing with Python 4-1. Analysis for words with KWIC
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
[Natural language processing] Preprocessing with Japanese
Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression
3. Natural language processing with Python 2-1. Co-occurrence network
3. Natural language processing with Python 1-1. Word N-gram
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
[Python] I played with natural language processing ~ transformers ~
3. Natural language processing with Python 3-1. Important word extraction tool TF-IDF analysis [original definition]
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
Python: Natural language processing
[Chapter 5] Introduction to Python with 100 knocks of language processing
Building an environment for natural language processing with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
100 Language Processing with Python Knock 2015
Natural language processing 1 Morphological analysis
Sentiment analysis with natural language processing! I tried to predict the evaluation from the review text
■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)
Japanese morphological analysis with Python
3. Natural language processing with Python 3-4. A year of corona looking back on TF-IDF [Data creation]
Dockerfile with the necessary libraries for natural language processing in python
Why is distributed representation of words important for natural language processing?
Image processing with Python 100 knocks # 4 Binarization of Otsu (discriminant analysis method)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Study natural language processing with Kikagaku
Replace dictionary value with Python> update ()
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
10 functions of "language with battery" python
100 Language Processing Knock-59: Analysis of S-expressions
Japanese language processing by Python3 (5) Ensemble learning of different models by Voting Classifier
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 1)
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
Basics of binarized image processing with Python
Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!
Algorithm learned with Python 8th: Evaluation of algorithm
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
Drawing with Matrix-Reinventor of Python Image Processing-
I tried natural language processing with transformers.
100 Language Processing Knock-88: 10 Words with High Similarity
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
100 language processing knock-81 (batch replacement): Dealing with country names consisting of compound words
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.
Getting started with Python with 100 knocks on language processing
Python: Deep Learning in Natural Language Processing: Basics
Static analysis of Python code with GitLab CI
Let's enjoy natural language processing with COTOHA API
Unbearable shortness of Attention in natural language processing
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 2 second half)
Get the value of a specific key in a list from the dictionary type in the list with Python