3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]

(1) Acquisition of "word emotion polarity value correspondence table"

1. Read "Word Emotion Polarity Correspondence Table"

import pandas as pd

pd.set_option('display.unicode.east_asian_width', True)

#Reading the emotion value dictionary
pndic = pd.read_csv(r"http://www.lr.pi.titech.ac.jp/~takamura/pubs/pn_ja.dic",
                    encoding="shift-jis",
                    names=['word_type_score'])
print(pndic)

image.png

2. Extract only words and emotion values ​​and convert to dict type

import numpy as np

#Extract word and emotion values
pndic["split"] = pndic["word_type_score"].str.split(":")
pndic["word"] = pndic["split"].str.get(0)
pndic["score"] = pndic["split"].str.get(3)

#Convert to dict type
keys = pndic['word'].tolist()
values = pndic['score'].tolist()
dic = dict(zip(keys, values))

print(dic)

image.png

⑵ Text to be analyzed

text = 'The nationwide import volume of spaghetti reached a record high by October, and customs suspects that the background is the so-called "needing demand" that has increased due to the spread of the new coronavirus infection. According to Yokohama Customs, the amount of spaghetti imported from ports and airports nationwide was approximately 142,000 tons as of the end of October. This was a record high, exceeding the import volume of one year three years ago by about 4000 tons. In addition, macaroni also had an import volume of more than 11,000 tons by October, which is almost the same as the import volume of one year four years ago, which was the highest ever.'

lines = text.split("。")

image.png

(3) Converting text into data by morphological analysis

1. Generate an instance of morphological analysis by MeCab

!apt install aptitude
!aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
!pip install mecab-python3==0.7
import MeCab
mecab = MeCab.Tagger("-Ochasen")

#Illustrate the results of morphological analysis on the first line
print(mecab.parse(lines[0]))

image.png

2. Data list with limited part of speech

#Extract words based on morphological analysis
word_list = []
for l in lines:
    temp = []
    for v in mecab.parse(l).splitlines():
        if len(v.split()) >= 3:
            if v.split()[3][:2] in ['noun','adjective','verb','adverb']:
                temp.append(v.split()[2])
    word_list.append(temp)

#Remove empty element
word_list = [x for x in word_list if x != []]

print(word_list)

image.png

⑷ Acquisition of emotional polarity value

1. Get the emotion polarity value from the dictionary

result = []
#Sentence-based processing
for sentence in word_list:
    temp = []
    #Word-based processing
    for word in sentence:
        word_score = []
        score = dic.get(word)
        word_score = (word, score)
        temp.append(word_score)       
    result.append(temp)

#Display as a data frame for each sentence
for i in range(len(result)):
    print(lines[i], '\n', pd.DataFrame(result[i], columns=["word", "score"]), '\n')

image.png

2. Calculate the average emotional polarity value for each sentence

#Calculate the average value for each sentence
mean_list = []
for i in result:
    temp = []
    for j in i:
        if not j[1] == None:
            temp.append(float(j[1]))
    mean = (sum(temp) / len(temp))
    mean_list.append(mean)

#Display as a data frame
print(pd.DataFrame(mean_list, columns=["mean"], index=lines[0:4]))

image.png

⑸ Reconsider "Word emotion polarity value correspondence table"

1. Check the composition ratio of negative and positive

#Number of positive words
keys_pos = [k for k, v in dic.items() if float(v) > 0]
cnt_pos = len(keys_pos)
#Number of negative words
keys_neg = [k for k, v in dic.items() if float(v) < 0]
cnt_neg = len(keys_neg)
#Neutral word count
keys_neu = [k for k, v in dic.items() if float(v) == 0]
cnt_neu = len(keys_neu)

print("Percentage of positives:", ('{:.3f}'.format(cnt_pos / len(dic))), "(", cnt_pos, "word)")
print("Percentage of negatives:", ('{:.3f}'.format(cnt_neg / len(dic))), "(", cnt_neg, "word)")
print("Neutral percentage:", ('{:.3f}'.format(cnt_neu / len(dic))), "(", cnt_neu, "word)")

image.png

2. Check for duplicate words

print("Number of elements before conversion to dict type:", len(pndic))
print("Number of elements after conversion to dict type:", len(dic), "\n")

pndic_list = pndic["word"].tolist()
print("Unique number of elements before conversion to dict type:", len(set(pndic_list)))

image.png

image.png


Recommended Posts

3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]
3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
3. Natural language processing with Python 3-1. Important word extraction tool TF-IDF analysis [original definition]
3. Natural language processing with Python 4-1. Analysis for words with KWIC
[Natural language processing] Preprocessing with Japanese
3. Natural language processing with Python 2-1. Co-occurrence network
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
[Python] I played with natural language processing ~ transformers ~
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
Python: Natural language processing
[Chapter 5] Introduction to Python with 100 knocks of language processing
Building an environment for natural language processing with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)
Natural language processing 3 Word continuity
Japanese morphological analysis with Python
3. Natural language processing with Python 3-4. A year of corona looking back on TF-IDF [Data creation]
Natural language processing 2 Word similarity
Visualize the frequency of word occurrences in sentences with Word Cloud. [Python]
Dockerfile with the necessary libraries for natural language processing in python
Image processing with Python 100 knocks # 4 Binarization of Otsu (discriminant analysis method)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Natural Language Processing Case Study: Word Frequency in'Anne with an E'
Study natural language processing with Kikagaku
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
10 functions of "language with battery" python
100 Language Processing Knock-59: Analysis of S-expressions
Japanese language processing by Python3 (5) Ensemble learning of different models by Voting Classifier
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 1)
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
Drawing with Matrix-Reinventor of Python Image Processing-
I tried natural language processing with transformers.
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
Getting started with Python with 100 knocks on language processing
Try it with Word Cloud Japanese Python JupyterLab.
Python: Deep Learning in Natural Language Processing: Basics
Automating simple tasks with Python Table of contents
Static analysis of Python code with GitLab CI
Let's enjoy natural language processing with COTOHA API
Unbearable shortness of Attention in natural language processing
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 2 second half)
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 2 first half)
Quick batch text formatting + preprocessing for Aozora Bunko data for natural language processing with Python