3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]

(1) Acquisition of "Japanese Evaluation Polar Dictionary (Noun Edition)"

1. Upload dictionary data to Colab

from google.colab import files
uploaded = files.upload()

2. Read dictionary data into data frame

import pandas as pd
#Control the misalignment of column names and values
pd.set_option('display.unicode.east_asian_width', True)

pndic_1 = pd.read_csv('pn.csv.m3.120408.trim', names=['word_pn_oth'])
print(pndic_1)

image.png

3. Expand the data frame with a delimiter

pndic_2 = pndic_1['word_pn_oth'].str.split('\t', expand=True)
print(pndic_2)

image.png

4. Removed noise of emotion polarity value

senti_score = pd.Series(pndic_2[1])
senti_score.value_counts()

image.png

pndic_3 = pndic_2[(pndic_2[1] == 'p') | (pndic_2[1] == 'e') | (pndic_2[1] == 'n')]
print(pndic_3)

image.png

5. Replace emotion polarity value with a numerical value

#Delete unnecessary columns
pndic_4 = pndic_3.drop(pndic_3.columns[2], axis=1)

pndic_4[1] = pndic_4[1].replace({'p':1, 'e':0, 'n':-1})
print(pndic_4)

image.png

6. Convert data frame to dict type

keys = pndic_4[0].tolist()
values = pndic_4[1].tolist()
dic = dict(zip(keys, values))

print(dic)

image.png

⑵ Preprocessing of the text to be analyzed

1. Specify the text

text = 'The nationwide import volume of spaghetti reached a record high by October, and customs suspects that the background is the so-called "needing demand" that has increased due to the spread of the new coronavirus infection. According to Yokohama Customs, the amount of spaghetti imported from ports and airports nationwide was approximately 142,000 tons as of the end of October. This was a record high, exceeding the import volume of one year three years ago by about 4000 tons. In addition, macaroni also had an import volume of more than 11,000 tons by October, which is almost the same as the import volume of one year four years ago, which was the highest ever.'

lines = text.split("。")

image.png

2. Create an instance of morphological analysis

!apt install aptitude
!aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
!pip install mecab-python3==0.7
import MeCab
mecab = MeCab.Tagger("-Ochasen")

#Illustrate the results of morphological analysis on the first line
print(mecab.parse(lines[0]))

image.png

3. List by sentence based on morphological analysis

word_list = []
for l in lines:
    temp = []
    for v in mecab.parse(l).splitlines():
        if len(v.split()) >= 3:
            if v.split()[3][:2] in ['noun','adjective','verb','adverb']:
                temp.append(v.split()[2])
    word_list.append(temp)

#Remove empty element
word_list = [x for x in word_list if x != []]

image.png

(3) Positive / negative judgment of sentences based on emotional polarity value

1. Acquisition of emotional polarity value

result = []
#Sentence-based processing
for sentence in word_list:
    temp = []
    #Word-based processing
    for word in sentence:
        word_score = []
        score = dic.get(word)
        word_score = (word, score)
        temp.append(word_score)       
    result.append(temp)

#Display as a data frame for each sentence
for i in range(len(result)):
    print(lines[i], '\n', pd.DataFrame(result[i], columns=["word", "score"]), '\n')

image.png

2. Mean value of emotional polarity value for each sentence

#Calculate the average value for each sentence
mean_list = []
for i in result:
    temp = []
    for j in i:
        if not j[1] == None:
            temp.append(float(j[1]))
    mean = (sum(temp) / len(temp))
    mean_list.append(mean)

#Display as a data frame
print(pd.DataFrame(mean_list, columns=["mean"], index=lines[0:4]))

image.png

⑷ Verification of "Japanese Evaluation Polar Dictionary (Noun Edition)"

1. Check the composition ratio of positive and negative

#Number of positive words
keys_pos = [k for k, v in dic.items() if v == 1]
cnt_pos = len(keys_pos)
#Number of negative words
keys_neg = [k for k, v in dic.items() if v == -1]
cnt_neg = len(keys_neg)
#Neutral word count
keys_neu = [k for k, v in dic.items() if v == 0]
cnt_neu = len(keys_neu)

print("Percentage of positives:", ('{:.3f}'.format(cnt_pos / len(dic))), "(", cnt_pos, "word)")
print("Percentage of negatives:", ('{:.3f}'.format(cnt_neg / len(dic))), "(", cnt_neg, "word)")
print("Neutral percentage:", ('{:.3f}'.format(cnt_neu / len(dic))), "(", cnt_neu, "word)")

image.png

2. Check for duplicate words

print("Number of elements before conversion to dict type:", len(pndic_4))
print("Number of elements after conversion to dict type:", len(dic), "\n")

image.png

pndic_list = pndic_4[0].tolist()
print("Unique number of elements before conversion to dict type:", len(set(pndic_list)))

image.png

import collections
print(collections.Counter(pndic_list))

image.png

image.png

Recommended Posts

3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]
3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
3. Natural language processing with Python 4-1. Analysis for words with KWIC
[Natural language processing] Preprocessing with Japanese
Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression
3. Natural language processing with Python 2-1. Co-occurrence network
3. Natural language processing with Python 1-1. Word N-gram
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
[Python] I played with natural language processing ~ transformers ~
3. Natural language processing with Python 3-1. Important word extraction tool TF-IDF analysis [original definition]
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
[Chapter 5] Introduction to Python with 100 knocks of language processing
Building an environment for natural language processing with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
Sentiment analysis with natural language processing! I tried to predict the evaluation from the review text
3. Natural language processing with Python 3-4. A year of corona looking back on TF-IDF [Data creation]
■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)
100 Language Processing with Python Knock 2015
Natural language processing 1 Morphological analysis
Japanese morphological analysis with Python
Practical exercise of data analysis with Python ~ 2016 New Coder Survey Edition ~
Dockerfile with the necessary libraries for natural language processing in python
Image processing with Python 100 knocks # 4 Binarization of Otsu (discriminant analysis method)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Study natural language processing with Kikagaku
Replace dictionary value with Python> update ()
100 Language Processing Knock with Python (Chapter 3)
10 functions of "language with battery" python
100 Language Processing Knock-59: Analysis of S-expressions
Japanese language processing by Python3 (5) Ensemble learning of different models by Voting Classifier
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 1)
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
Basics of binarized image processing with Python
Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!
Algorithm learned with Python 8th: Evaluation of algorithm
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
Drawing with Matrix-Reinventor of Python Image Processing-
I tried natural language processing with transformers.
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.