This time, I analyzed the tweet data of ** English ** for about one year using the sentiment analysis method called VADER proposed in ICWSM-14. For VADER, I referred to the article Sentiment Analysis has arrived at NLTK. Thank you.
VADER VADER is implemented in Python's natural language processing package nltk. Try using it.
In [1]: from nltk.sentiment.vader import SentimentIntensityAnalyzer
In [2]: analyzer = SentimentIntensityAnalyzer()
In [3]: analyzer.polarity_scores("I am happy!!!")
Out[3]: {'compound': 0.6784, 'neg': 0.0, 'neu': 0.179, 'pos': 0.821}
And "compound", "neg" (nagative), "neu" (neutral), "pos" (positive) are output from 0 to 1.
An English tweet obtained from the Twitter Streaming API from 2014/10/31 to 2015/10/28 (I got it from my senior!). There were 1089358 tweets per day. Sentiment analysis was performed on each tweet, and the value of "pos" was averaged daily. Furthermore, the final data was standardized so that the average was 0 and the standard deviation was 1.
In the paper Twitter mood predicts the stock market, the following results were obtained using the polarity analysis tool Opinion Finder and GPOMS that analyzes 6 types of emotion factors. Was obtained. It is expected that the value of "pos" this time will be close to the Happy result of Opinion Finder and GPOMS.
The obtained results are drawn as a time-series graph. Pay attention to the fact that there are some parts that are protruding above.
Everyone is positive when there are events to make it fun! !! I wanted to analyze the part where the value of "neg" is large, but I didn't understand the cause after all.
Recommended Posts