List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)

Open In Colab

A list of tools you can use to easily try sentiment analysis

It is listed below in a bulleted list.

Installation of required libraries

The following is the installation method on google colab.

If you want to try it locally, change the installation method accordingly.

#Install MeCab
!apt install mecab libmecab-dev mecab-ipadic-utf8
!pip install mecab-python3

# mecab-ipadic-Install NEologd
!apt install git make curl xz-utils file
!git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
!echo yes | mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -a

# Ref: https://qiita.com/Fulltea/items/90f6ebe6dcceaf64eaef
# Ref: https://qiita.com/SUZUKI_Masaya/items/685000d569452585210c

!ln -s /etc/mecabrc /usr/local/etc/mecabrc
# Ref: https://qiita.com/Naritoshi/items/8f55d7d5cce9ce414395
#Library for sentiment analysis
!pip install asari oseti pymlask

Data preparation

The text used as input for sentiment analysis is from Aozora Bunko.

"Puppet use" Hans Christian Andersen (Translated by Genkuro Yazaki)

I picked it up from.

list_text = [
             'This person must be the happiest person in the world.',
             'The playhouse was wonderful and the audience was wonderful.',
             'If it was in the Middle Ages, it would probably have been burned at the stake.',
             'When it came to everyone's annoyance, it was as if flies were buzzing in the bottle.',
             'If we humans can come up with these things, we should be able to live longer before they are buried in the earth.'
]

asari

#Simple operation check
from asari.api import Sonar
sonar = Sonar()
res = sonar.ping(text="Too many ads ♡")
res 
{'classes': [{'class_name': 'negative', 'confidence': 0.9086981552962491},
  {'class_name': 'positive', 'confidence': 0.0913018447037509}],

'text':'Too many ads ♡', 'top_class': 'negative'}

list(map(sonar.ping, list_text))
[{'classes': [{'class_name': 'negative', 'confidence': 0.10382535749585702},
   {'class_name': 'positive', 'confidence': 0.896174642504143}],

'text':' This person must be the happiest person in the world. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.035517582235360945}, {'class_name': 'positive', 'confidence': 0.964482417764639}], 'text':' The playhouse was wonderful and the audience was wonderful. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.5815274190768989}, {'class_name': 'positive', 'confidence': 0.41847258092310113}], 'text':' If it was the Middle Ages, it would probably have been burned at the stake. ', 'top_class': 'negative'}, {'classes': [{'class_name': 'negative', 'confidence': 0.2692695045573754}, {'class_name': 'positive', 'confidence': 0.7307304954426246}], 'text':' When it comes to everyone's annoyance, it was as if flies were buzzing in the bottle. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.050528495655525495}, {'class_name': 'positive', 'confidence': 0.9494715043444746}], 'text':'If we humans can come up with these things, we should be able to live longer before they are buried in the earth', 'top_class': 'positive'}]

The sentence, "When it comes to everyone's annoyance, it was as if a fly was buzzing in a bottle." Intuitively, it was a negative impression, but it was judged to be positive.

There seems to be a reasonable judgment for other examples.

oseti

#Simple operation check
import oseti

analyzer = oseti.Analyzer()
analyzer.analyze('I'm waiting in heaven.')
[1.0]
list(map(analyzer.analyze, list_text))
[[0.0], [1.0], [0], [0], [1.0]]

The second sentence, "The playhouse was wonderful and the customers were wonderful."

When

Fifth sentence "If we humans can come up with this, we should be able to live longer before we are buried in the earth."

Only positive (+1) judgment, neutral judgment for other sentences.

After all, the impression that dictionary-based is weak against words that are not included in the dictionary.

pymlask

The author of the package is the same as oseti.

#Simple operation check
import mlask
emotion_analyzer = mlask.MLAsk()
emotion_analyzer.analyze('I don't hate him!(;´Д`)')
# => {'text': 'I don't hate him!(;´Д`)',
#     'emotion': defaultdict(<class 'list'>,{'yorokobi': ['Hate*CVS'], 'suki': ['Hate*CVS']}),
#     'orientation': 'POSITIVE',
#     'activation': 'NEUTRAL',
#     'emoticon': ['(;´Д`)'],
#     'intension': 2,
#     'intensifier': {'exclamation': ['!'], 'emotikony': ['´Д`', 'Д`', '´Д', '(;´Д`)']},
#     'representative': ('yorokobi', ['Hate*CVS'])
#     }
{'activation': 'NEUTRAL',
 'emoticon': ['(;´Д`)'],

'emotion': defaultdict (list, {'suki': ['dislike * CVS'],'yorokobi': ['dislike * CVS']}), 'intensifier': {'emotikony': ['´Д', 'Д', '´Д', '(;´Д)'], 'exclamation': ['!']}, 'intension': 2, 'orientation': 'POSITIVE', 'representative': ('yorokobi', ['dislike * CVS']), 'text':'I don't hate him! (; ´Д)'}

#It's a big deal, so I'll try using the neologd dictionary

# mecab-ipadic-Find out where to install neologd
import subprocess

cmd='echo `mecab-config --dicdir`"/mecab-ipadic-neologd"'
path = (subprocess.Popen(cmd, stdout=subprocess.PIPE,
                           shell=True).communicate()[0]).decode('utf-8')
                           
emotion_analyzer = mlask.MLAsk('-d {0}'.format(path))  # Use other dictionary

list(map(emotion_analyzer.analyze, list_text))
[{'activation': 'NEUTRAL',
  'emoticon': None,

'emotion': defaultdict (list, {'yorokobi': ['happiness']}), 'intensifier': {}, 'intension': 0, 'orientation': 'POSITIVE', 'representative': ('yorokobi', ['happy']), 'text':' This person must be the happiest person in the world. '}, {'emotion': None,'text':' The playhouse was wonderful and the audience was wonderful. '}, {'emotion': None,'text':' If it was the Middle Ages, it would probably have been burned at the stake. '}, {'emotion': None,'text':'When it comes to everyone's annoyance, it was like a fly buzzing in a bottle. '}, {'emotion': None, 'text':'If we humans can come up with this, we should be able to live longer before we are buried in the earth'}]]

This method is also judged to be positive if there is a word (happiness) in the dictionary, but it is impossible to judge if it is not in the dictionary.

Impression that the result is not good.

Summary

I tried a tool that makes it easy to analyze the emotions of Japanese sentences.

Thank you for publishing these tools.

If you want to do serious sentiment analysis and get more reasonable results, you will probably need to add processing according to the sentence category that suits your purpose, or use neural network techniques (in that case, the data set). It's hard to create).

reference

Commentary, summary article

-[[27 posted] Dataset summary that can be used for sentiment analysis of sentences, facial expressions, and voice | Lionbridge AI](https://lionbridge.ai/ja/datasets/15-free-sentiment-analysis-datasets-for-machine -learning /) --Links to resources and polarity dictionaries, etc. -[Natural language processing] How to proceed with sentiment analysis & points that are easy to get hooked on --Qiita -Story of making and packaging Japanese Sentiment Analyzer --Ahogrammer -Sentiment analysis of corporate word-of-mouth data of job change meetings using deep learning --Qiita -I tried to analyze the emotions of the whole novel "Weathering with You" ☔️ --Qiita -Sentiment Analysis library oseti for Python using Japanese evaluation polarity dictionary has been released --Qiita -Sentiment analysis of text with ML-Ask --Qiita

Sentiment analysis Japanese dataset

-SNOW D18: Japanese Emotional Expression Dictionary-Nagaoka University of Technology Natural Language Processing Laboratory --Nagaoka University of Technology Natural Language Processing Laboratory ――Approximately 2,000 expressions are recorded, and each expression is given 48 categories of emotions that we have defined independently.

Recommended Posts

List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)
Japanese can be used with Python in Docker environment
Summary of statistical data analysis methods using Python that can be used in business
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
I made a familiar function that can be used in statistics with Python
Python knowledge notes that can be used with AtCoder
List of Python code used in big data analysis
Processing of python3 that seems to be usable in paiza
Scripts that can be used when using bottle in Python
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Python3] Code that can be used when you want to resize images in folder units
Try to get a list of breaking news threads in Python.
I tried to expand the database so that it can be used with PES analysis software
Python standard input summary that can be used in competition pro
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
Comparison of 4 styles that can be passed to seaborn with set_context
[Completed version] Try to find out the number of residents in the town from the address list with Python
Easy padding of data that can be used in natural language processing
Mathematical optimization that can be used for free work with Python + PuLP
[Cloudian # 5] Try to list the objects stored in the bucket with Python (boto3)
Try logging in to qiita with Python
[Python3] Code that can be used when you want to cut out an image in a specific size
[Python3] Code that can be used when you want to change the extension of an image at once
I tried to create a class that can easily serialize Json in Python
I want to create a priority queue that can be updated in Python (2.7)
A personal memo of Pandas related operations that can be used in practice
Summary of scikit-learn data sources that can be used when writing analysis articles
How to install a Python library that can be used by pharmaceutical companies
How to get a list of files in the same directory with python
First steps to try Google CloudVision in Python
File types that can be used with Go
Functions that can be used in for statements
List packages that can be updated with pip
[Python] I examined the practice of asynchronous processing that can be executed in parallel with the main thread (multiprocessing, asyncio).
Convert images from FlyCapture SDK to a form that can be used with openCV
How to identify the element with the smallest number of characters in a Python list?
Geographic information visualization of R and Python that can be expressed in Power BI
Morphological analysis and tfidf (with test code) that can be done in about 1 minute
How to count the number of occurrences of each element in the list in Python with weight
A mechanism to call a Ruby method from Python that can be done in 200 lines
Try to implement permutation full search that often appears in competition pros with python
Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
Summary of tools needed to analyze data in Python
Basic algorithms that can be used in competition pros
Color list that can be set with tkinter (memorial)
Easy way to scrape with python using Google Colab
[Memorandum] Japanese keys cannot be used in python string.Template.substitute
ANTs image registration that can be used in 5 minutes
list comprehension because operator.methodcaller cannot be used in python 2.5
Can be used in competition pros! Python standard library
Non-linear simultaneous equations can be easily solved in Python.
List of my articles that may be useful in competition pros (updated from time to time)
How to set up a simple SMTP server that can be tested locally in Python
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]
3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]
Try to visualize the nutrients of corn flakes that M-1 champion Milkboy said with Python
[Python] A program to find the number of apples and oranges that can be harvested
Format summary of formats that can be serialized with gensim
Try scraping the data of COVID-19 in Tokyo with Python
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
Try to get the function list of Python> os package