Python: Negative / Positive Analysis: Text Analysis Application

What is Negative / Positive Analysis?

Overview

Beginning with psychology, humans have analyzed their emotions.

Among them, in negative / positive analysis, mainly people's remarks and ideas Analyze whether it is positive (positive) or backward (negative).

Negative / positive analysis can be said to be a type of technique called "sentiment analysis".

This extracts expressions related to evaluation and emotions contained in sentences, etc. It refers to technology that analyzes emotions in sentences.

Negative / positive analysis methods include word-by-word classification using a polar dictionary and deep learning.

Polar dictionary

negative/Positives are called "polarity"
A polarity dictionary is a collection of words with polarity.

The polarity dictionary called PN Table did not manually polarize a large number of words. It is made by assigning points from -1 to +1 to highly relevant words based on words with a small amount of polarity information.

In addition, there is a "Japanese Evaluation Polar Dictionary" published on the page of Inui-Okazaki Laboratory of Tohoku University. This is by tagging it as neutral in addition to negative and positive The polar balance of the words contained in the dictionary is balanced.

There is also a "Polar Phrase Dictionary" created by Yahoo! JAPAN Laboratories.

#Outputs PNTable.
import pandas as pd
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
                    sep=':',
                    encoding='utf-8',
                    names=('Word','Reading','POS', 'PN')
                   )
print (pn_df)

image.png

Negative / positive analysis using polar dictionary

Morphological analysis

Morphological analysis is the work of dividing a sentence into words that are the smallest unit.

By performing morphological analysis, you can find the word corresponding to the polarity dictionary. This time, we will perform morphological analysis using MeCab and change the text into a form that is easy to read.

import MeCab
mecab = MeCab.Tagger('')

title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()

print(mecab.parse(file))

image.png

Listing of morphological analysis

First, list the analysis results to make other processing easier.

When performing morphological analysis with MeCab, the last line is "blank" and the penultimate line is "EOS". Since those two lines are not used, we will delete them.

In each line of the analysis result, the word is followed by a tab, and other information is separated by a comma.

import MeCab
import pandas as pd
import re

mecab = MeCab.Tagger('')

title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()

def get_diclist(file):
    parsed = mecab.parse(file)
    #Separate the analysis result by line break
    lines = parsed.split('\n')
    #Create a new list with the last two lines removed
    lines = lines[0:-2]
    
    #Create a list of analysis results
    diclist = []
    for word in lines:
        #Create data separated by tabs and commas
        data = re.split('\t|,',word)  
        datalist = {'BaseForm':data[7]}
        diclist.append(datalist)
    return(diclist)

wordlist = get_diclist(file)
print(wordlist)

image.png

Make a negative / positive judgment on the analysis result

Read the polarity dictionary (PN Table) You can give polarity to the words that appear by comparing them with the list of analysis results.

Create a dictionary of words and polarity values only from PNTable. Creates a new list of words and polarity values that exist in the new PNTable.

import pandas as pd
#Read the dictionary
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
                    sep=':',
                    encoding='utf-8',
                    names=('Word','Reading','POS', 'PN')
                   )

#Change PNTable to dict type with only words and polarity values
word_list = list(pn_df['Word'])
pn_list = list(pn_df['PN'])
pn_dict = dict(zip(word_list, pn_list))

#Extract the words that exist in the PN Table from the list of analysis results
def add_pnvalue(diclist_old):
    diclist_new = []
    for word in diclist_old:
        baseword = word['BaseForm']        
        if baseword in pn_dict:
            #Add the polarity value and its word if it exists in the PNTable
            pn = float(pn_dict[baseword])
        else:
            #If it does not exist, specify not found
            pn = 'notfound'
        word['PN'] = pn
        diclist_new.append(word)
    return(diclist_new)

wordlist = get_diclist(file) #1.2.This is the function used in 3.
pn_list = add_pnvalue(wordlist)

print(pn_list)

image.png

Negative / positive analysis implementation

import re
import csv
import time
import pandas as pd
import matplotlib.pyplot as plt
import MeCab
import random
%matplotlib inline

#Read file
title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()

#Create MeCab instance
mecab = MeCab.Tagger('')

#Reading the dictionary
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
                    sep=':',
                    encoding='utf-8',
                    names=('Word','Reading','POS', 'PN')
                   )


def get_diclist(file):
    parsed = mecab.parse(file)      
    lines = parsed.split('\n')  
    lines = lines[0:-2]         
    diclist = []
    for word in lines:
        l = re.split('\t|,',word)
        d = {'BaseForm':l[7]}
        diclist.append(d)
    return(diclist)

word_list = list(pn_df['Word'])
pn_list = list(pn_df['PN'])
pn_dict = dict(zip(word_list, pn_list))

def add_pnvalue(diclist_old):
    diclist_new = []
    for word in diclist_old:
        base = word['BaseForm']       
        if base in pn_dict:
            pn = float(pn_dict[base]) 
        else:
            pn = 'notfound'
        word['PN'] = pn
        diclist_new.append(word)
        
    pn_point = []
    for word in diclist_new:
        pn = word['PN']
        if pn != 'notfound':
            pn_point.append(pn)
            
    return(pn_point)

wordlist = get_diclist(file)
pn_list = add_pnvalue(wordlist)
    
plt.plot(pn_list)
plt.title(title)
plt.show

image.png

Recommended Posts

Python: Negative / Positive Analysis: Text Analysis Application
Negative / Positive Analysis 1 Application of Text Analysis
Python: Negative / Positive Analysis: Twitter Negative / Positive Analysis Using RNN-Part 1
Negative / Positive Analysis 3 Twitter Negative / Positive Analysis (2)
Python: Japanese text: Morphological analysis
Text mining with Python ① Morphological analysis
Data analysis python
Scraping & Negative Positive Analysis of Bunshun Online Articles
Clustering text in Python
Data analysis with python 2
Python: Time Series Analysis
Text mining with Python ① Morphological analysis (re: Linux version)
python text aloud (pyttsx3)
Data analysis using Python 0
Data analysis overview python
Voice analysis with python
Text processing in Python
Challenge principal component analysis of text data with Python
Python data analysis template
Association analysis in Python
Voice analysis with python
Application of Python 3 vars
Data analysis with Python
Python application: Pandas # 3: Dataframe
Regression analysis in Python
My python data analysis container
UTF8 text processing in python
Python application: Pandas Part 1: Basic
[Python] Morphological analysis with MeCab
Python for Data Analysis Chapter 4
[Python] inf represents positive infinity
Python Statistical Techniques-Statistical Analysis Against Python-
Sentiment analysis with Python (word2vec)
[Python] Notes on data analysis
Axisymmetric stress analysis in Python
Python application: data visualization # 2: matplotlib
Python data analysis learning notes
Planar skeleton analysis with Python
Is Vtuber Positive? Is it negative?
Japanese morphological analysis with Python
Speech to speech in python [text to speech]
I made a python text
Python for Data Analysis Chapter 2
Simple regression analysis in Python
Web application with Python + Flask ② ③
Data analysis using python pandas
Muscle jerk analysis with Python
Text sentiment analysis with ML-Ask
Web application with Python + Flask ④
Python for Data Analysis Chapter 3
Let's make a positive / negative judgment tool into a band graph (Python)