[PYTHON] Precautions when using TextBlob trait analysis

Precautions when using TextBlob trait analysis

It's a very rudimentary story, but I've found some caveats when using a library that can perform sentiment analysis.

About TextBlob characteristic analysis

Regarding TextBlob in the Python library, you can use the -.sentiment method of this library to make simple sentences Polarity and Subjectivity (Polarity: Polarity. Text). Subjectivity: Independence. Whether the speaker's attitude is positive or negative) can be analyzed. There is an analyzed report on here, so please take a look.

TextBlob trait analysis is a black box

You can use the .sentiment method to analyze the tendency of books and the stream of consciousness of fictitious characters, but the problem is that TextBlob's sentiment analysis is a black box.

In the Official Documents, the Pattern of the University of Antwerp, Computational Linguistics and Psycholinguistics There is a guide that uses the text mining module and dataset, and the Naive Bayes classifier from the NLTK library. In other words, if you do not know the sources of these two, you will not know "Why did this polarity / subjectivity value come out?" It seems that it will be necessary to verify what kind of classification result will be obtained by the .sentiment method in the future.

Characteristic analysis considering context is not possible

Most importantly, "TextBlob's .sentiment method does not allow context-sensitive trait analysis." As part of the verification, I wrote 10 lines of text containing ethical imperatives such as the Ten Commandments of Moses, and compared the texts with the exact opposite.

We must be ethical. We must have the independence of will. We must be based on the concept of duty. We must think universally. We must not tell lies. We must not kill ourselves. We must cultivate our talent very arbitrary. We must be kind to each other. We must preserve our own lives. We must secure our happiness. E_pos2.png

We must not be ethical. We must not have the independence of will. We must not be based on the concept of duty. We must not think universally. We must tell lies. We must kill ourselves. We must not cultivate our talent very arbitrary. We must not be kind to each other. We must not preserve our own lives. We must not secure our happiness. E_neg2.png

The result is that both texts have almost the same polarity and independence. I had predicted that personal pronouns and modal auxiliary verbs, including myself, such as “We” and “must” would increase Subjectivity, but apparently this is not the case. Also note that adding "not" to completely reverse the meaning of the sentence does not change the polarity or independence at all. In particular,

  • “We must not be ethical.” *

Such sentences are also regarded as "positive and independent texts".

Correct use of TextBlob characterization

TextBlob's .sentiment method can only analyze an abstract impression such as "whether the word used is totally positive or not?" And "Is the person who wrote it really independent and positive?" It turns out that it is inappropriate to know a specific profile such as "?". In order to use it for the latter purpose, it is necessary to use another library that can analyze characteristics from the context, or to grasp the meaning of sentences to some extent by parsing and use it for labeling. My current goal is to use the characteristics of ethical imperatives in machine learning datasets, so I would like to write another report once I find a good solution.

Recommended Posts

Precautions when using TextBlob trait analysis
Precautions when using Chainer
Precautions when using pit in Python
Precautions when using codecs and pandas
Precautions when using the urllib.parse.quote function
Precautions when using phantomjs from python
Precautions when using six with Python 2.5
Precautions when using for statements in pandas
Precautions when using OpenCV from Power Automate Desktop
Precautions when using tf.keras.layers.TimeDistributed for tf.keras custom layer
Precautions when using google-cloud library with GAE / py
Data analysis using xarray
Summary when using Fabric
Precautions when handling Luigi
Precautions when using sqlite3 on macOS Sierra (10.12) with multiprocessing
Data analysis using Python 0
Precautions when installing fbprophet
Orthologous analysis using OrthoFinder
(Personal) points when using ctypes
Environment variables when using Tkinter
Precautions when upgrading TensorFlow (to 1.3)
When using optparse with iPython
Japanese morphological analysis using Janome
DEBUG settings when using Django
When using if and when using while
File structure when using serverless-python-requirements
Use configparser when using API
Data analysis using python pandas
Small speedup when using pytorch
Precautions when using a list or dictionary as the default argument