Introduction

This time, I analyzed the tweet data of ** English ** for about one year using the sentiment analysis method called VADER proposed in ICWSM-14. For VADER, I referred to the article Sentiment Analysis has arrived at NLTK. Thank you.

VADER VADER is implemented in Python's natural language processing package nltk. Try using it.


In [1]: from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [2]: analyzer = SentimentIntensityAnalyzer()

In [3]: analyzer.polarity_scores("I am happy!!!")
Out[3]: {'compound': 0.6784, 'neg': 0.0, 'neu': 0.179, 'pos': 0.821}

And "compound", "neg" (nagative), "neu" (neutral), "pos" (positive) are output from 0 to 1.

Datasets and experiments

An English tweet obtained from the Twitter Streaming API from 2014/10/31 to 2015/10/28 (I got it from my senior!). There were 1089358 tweets per day. Sentiment analysis was performed on each tweet, and the value of "pos" was averaged daily. Furthermore, the final data was standardized so that the average was 0 and the standard deviation was 1.

Related research (?)

In the paper Twitter mood predicts the stock market, the following results were obtained using the polarity analysis tool Opinion Finder and GPOMS that analyzes 6 types of emotion factors. Was obtained. It is expected that the value of "pos" this time will be close to the Happy result of Opinion Finder and GPOMS.

result

The obtained results are drawn as a time-series graph. Pay attention to the fact that there are some parts that are protruding above.

Thanksgiving (2014/11/27)

Christmas and New Year holidays

Valentine's day

in conclusion

Everyone is positive when there are events to make it fun! !! I wanted to analyze the part where the value of "neg" is large, but I didn't understand the cause after all.

[PYTHON] Sentiment analysis of large-scale tweet data by NLTK