[PYTHON] Sentiment analysis of large-scale tweet data by NLTK

Introduction

This time, I analyzed the tweet data of ** English ** for about one year using the sentiment analysis method called VADER proposed in ICWSM-14. For VADER, I referred to the article Sentiment Analysis has arrived at NLTK. Thank you.

VADER VADER is implemented in Python's natural language processing package nltk. Try using it.


In [1]: from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [2]: analyzer = SentimentIntensityAnalyzer()

In [3]: analyzer.polarity_scores("I am happy!!!")
Out[3]: {'compound': 0.6784, 'neg': 0.0, 'neu': 0.179, 'pos': 0.821}

And "compound", "neg" (nagative), "neu" (neutral), "pos" (positive) are output from 0 to 1.

Datasets and experiments

An English tweet obtained from the Twitter Streaming API from 2014/10/31 to 2015/10/28 (I got it from my senior!). There were 1089358 tweets per day. Sentiment analysis was performed on each tweet, and the value of "pos" was averaged daily. Furthermore, the final data was standardized so that the average was 0 and the standard deviation was 1.

Related research (?)

In the paper Twitter mood predicts the stock market, the following results were obtained using the polarity analysis tool Opinion Finder and GPOMS that analyzes 6 types of emotion factors. Was obtained. bollen.png It is expected that the value of "pos" this time will be close to the Happy result of Opinion Finder and GPOMS.

result

The obtained results are drawn as a time-series graph. figure_1.png Pay attention to the fact that there are some parts that are protruding above.

Thanksgiving (2014/11/27)

figure_1.png

Christmas and New Year holidays

figure_1.png

Valentine's day

figure_1.png

in conclusion

Everyone is positive when there are events to make it fun! !! I wanted to analyze the part where the value of "neg" is large, but I didn't understand the cause after all.

Recommended Posts

Sentiment analysis of large-scale tweet data by NLTK
Analysis of financial data by pandas and its visualization (1)
Visualization of data by prefecture
First satellite data analysis by Tellus
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats
A simple data analysis of Bitcoin provided by CoinMetrics in Python
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
Sentiment analysis of corporate word-of-mouth data of career change meetings using deep learning
Example of 3D skeleton analysis by Python
Sentiment analysis of tweets with deep learning
Analysis of X-ray microtomography image by Python
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 1: Data analysis)
Analysis of shared space usage by machine learning
A well-prepared record of data analysis in Python
A story about data analysis by machine learning
Data analysis python
Data analysis Titanic 1
Data analysis Titanic 3
[Python] [Word] [python-docx] Simple analysis of diff data using python
Let's analyze the questionnaire survey data [4th: Sentiment analysis]
Anomaly detection of time series data by LSTM (Keras)
Try rudimentary sentiment analysis on Twitter Stream API data.
Analysis of measurement data ①-Memorandum of understanding for scipy fitting-
Story of image analysis of PDF file and data extraction
Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-
Visualization method of data by explanatory variable and objective variable
Let's analyze the sentiment of Tweet using Chainer (1st)