[PYTHON] Stock Price Forecast 1 Chapter 1

Aidemy 2020/10/30

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This time, I will post a memo of stock price forecast 1. Nice to meet you.

What to learn this time ・ Check the flow of stock price forecasts ・ ① Get tweets ・ ② Sentiment analysis (negative / positive analysis)

Stock price forecast flow

-There are Technical analysis and Fundamental analysis, and this time we will perform technical analysis. The flow is as follows. (In this Chapter, do ① and ②) (1) Use the Twitter API to get the past tweets of a certain account. (2) Perform daily __ tweet sentiment analysis (negative / positive analysis) __ with the polarity dictionary. ③ Get __time series data of Nikkei Stock Average __. ④ Create a __model that predicts the ups and downs of the stock price on the next day from the daily sentiment.

① Get tweets

・ First, register with TwitterAPI. ・ Registration method is omitted. ・ Register and get __ "Consumer Key", "Consumer Secret", "Access Token Secret", "Access Token" __, and use it to get tweets.

Get tweets by keyword

-Code![Screenshot 2020-10-17 17.17.36.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/126526e1-9450-c2db- 5e75-dfbda9b2f3cd.png)

-The 'python' part of the above code __ "res" __ is the keyword to be acquired.

Get tweets by account

-Code![Screenshot 2020-10-17 17.24.56.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/eb60f385-4acc-774c- b11e-7d4bb2cbeb06.png)

-The '@nikkei_bizdaily' part of the above code "tweets" is the account to be acquired. (@Nikkei_bizdaily is an account of "Nikkei Sangyo Shimbun")

② Sentiment analysis of tweets (negative / positive analysis)

-Sentiment analysis (negative / positive analysis) is to judge whether __text has a positive meaning or a negative meaning by natural language processing __. -The criteria for judgment are included in the __ "polarity dictionary" __. Sentiment analysis is performed by referring to the dictionary for each word in the text. -This time, we will use __ "Word Emotion Polarity Correspondence Table" __ as the polarity dictionary. This is a word applied to a value from -1 to 1 (PN value) with reference to the "Iwanami Japanese Dictionary (Iwanami Shoten)". A larger value indicates a positive meaning, and a smaller value indicates a negative meaning.

Flow of tweet emotion analysis

I. Get tweets and convert them to __DataFrame __. (See stock price forecast procedure ①) Ii. Import polarity dictionary / DataFrame conversion. Ⅲ. _ Morphological analysis with MeCab __. Iv. Get the __PN value from the polarity dictionary for each morphologically analyzed word __ and add it to the dictionary. V. Calculate the average __PN value for each tweet . Ⅵ . Standardization __ is performed, and the change in the PN value is displayed on the __ graph __.

Ii. Import polarity dictionary / DataFrame conversion

-Import the polarity dictionary with __pd.read_csv () __. In the "Word Emotion Polarity Correspondence Table", specify __ "names = ('Word','Reading','POS','PN')" __ to read the word or PN value. -The word part of the polarity dictionary is listed and stored in word_list and the PN value part pn_list, and the dictionary is also created (pn_dict).

·code スクリーンショット 2020-10-19 21.13.28.png

Ⅲ. Morphological analysis with MeCab

-Since the tweet cannot be passed to the polarity dictionary as it is, it is divided into words by morphological analysis. Here, you define and use the function __ "get_diclist" __. -Morphological analysis is performed with MeCab. Also, since the analyzed data is broken for each word, it is listed separately for each line __. -Also, the last two lines are unnecessary, so delete them. -Since each split line is separated from the tab, split it completely with __re.split () __ and add it to the list word by word. (Dic_list)

·code スクリーンショット 2020-10-19 22.27.42.png

Iv. Obtain the PN value from the polarity dictionary for each morphologically analyzed word and add it to the dictionary.

-When the polarity dictionary can be referred by morphological analysis, __ add the PN value of the polarity dictionary to the dict data for each word . This is used by defining the function "add_pnvalue". -Search the dictionary for the basic word form __ "'BaseForm'" that can be obtained with the __ "get_diclist" __ created in the previous section, and if it is in the dictionary, obtain it in that form, and if it is not in the dictionary, _ Add each as 'notfound'_ to an empty list called diclist_new.

-Code![Screenshot 2020-10-20 12.16.09.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/2930203a-1917-18c9- bee1-d40ef901380e.png)

V. Calculate the average PN value for each tweet

-From the list returned by diclist_new above, find the average of __PN values __. ・ In calculating the average, the part that was'not found'in the previous section is excluded from the calculation of the PN value. Also, if there is nothing in pn_list at the time of calculation, it will be calculated with an average of 0 (because an error will occur if no value is entered).

・ Code![Screenshot 2020-10-20 11.13.32.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/fa7c1dac-8bb0-7e65- d938-8e76b71e6e9d.png)

Perform standardization and display changes in the PN value on a graph

-Finally, the __PN value is graphed and visualized __. However, if it is left as it is, the result of the __ graph will change depending on whether there are many positive or negative meanings in the entire polarity dictionary, so adjust the result by performing standardization. -Create the graph with __plt.plot () __. The vertical axis is __ "average of pn values" __, and the horizontal axis is __ "date" __. At this time, the part 'text', which is information other than the PN value, is not necessary for the graph, so delete it.

・ (Review) Standardization method (Unsupervised learning 3): __ (Difference between data and mean) ÷ Standard deviation __ X = (X - X.mean(axis=0))/X.std(axis=0)

-Code (when standardizing the average "means_list" of the pn value of a certain tweet df_tweets) スクリーンショット 2020-10-20 12.32.24.png

・ Graph![Screenshot 2020-10-20 11.57.53.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/80c50519-c43f-7dbc- 4860-5899f7e19aa3.png)

Summary

-Stock price forecasting is performed in the flow of "acquisition of tweets", "sentiment analysis (negative / positive analysis)", "acquisition of time series data of stock prices", and "creation of stock price forecast model". ・ You can get tweets by registering with the Twitter API. ・ Refer to "Negative / Positive Analysis" for how to analyze emotions.

This time is over. Thank you for reading until the end.

Recommended Posts

Stock Price Forecast 2 Chapter 2
Stock Price Forecast 1 Chapter 1
Stock price forecast with tensorflow
Python: Stock Price Forecast Part 2
Python: Stock Price Forecast Part 1
[Python] My stock price forecast [HFT]
Stock price forecast using machine learning (scikit-learn)
Stock price forecast using deep learning (TensorFlow)
Stock price forecast using machine learning (regression)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Cryptocurrency price fluctuation forecast
Stock price forecast by machine learning Numerai Signals
Kaggle ~ House Price Forecast ② ~
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
Kaggle ~ Home Price Forecast ~
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~
Stock Price Forecasting Using LSTM_1
Get stock price with Python
Stock price data acquisition tips
[Introduction to Systre] Stock price forecast; Monday is weak m (__) m
Stock price forecast by machine learning Let's get started Numerai
Bitcoin Price Forecast on TensorFlow (LSTM)
Stock price forecast by machine learning is so true Numerai Signals
I posted a UNIQLO (Fast Retailing) stock price forecast dataset to Kaggle
Stock price acquisition code by scraping (Selenium)
Download Japanese stock price data with python
[Python] Creating a stock price drawdown chart
Stock price and statistics (mean, standard deviation)
4th month to continue making stock price forecast AI for 10 hours a day
Continue to make stock price forecast AI for 10 hours a day 1st month
Continue to make stock price forecast AI for 10 hours a day 3rd month