Python: Stock Price Forecast Part 1

Get Tweets

Overview

There are two methods for investing in stocks: technical analysis and fundamentals analysis. This time we will deal with technical analysis.

Predict the Nikkei Stock Average using Twitter. First, I will explain the general flow.

1, Use Twitter API to get past tweets of an account from Twitter. 2, Analyze the sentiment of daily tweets using the polarity dictionary. 3, Get time series data of Nikkei Stock Average. 4, Predict the ups and downs of the stock price on the next day from the daily sentiment using machine learning.

Access token

You will need an access token to get tweets from Twitte. This is like the ID and PASS in the user account

It refers to two types of character strings, "Access Token Key" and "Access Token Secret".
Here you will get tweets that contain a certain word.
import time
from requests_oauthlib import OAuth1Session
import json
import datetime, time, sys

CK = ''         #Consumer Key''Enter in
CS = ''         #Consumer Secret''Enter in
AT = ''         #Access Token''Enter in
AS = ''         #Access Token Secret''Enter in

session = OAuth1Session(CK, CS, AT, AS)

url = 'https://api.twitter.com/1.1/search/tweets.json'
res = session.get(url, params = {'q':u'python', 'count':100})
res_text = json.loads(res.text)
for tweet in res_text['statuses']:
    print ('-----')
    print (tweet['created_at'])
    print (tweet['text'])

Click here to get tweets including artificial intelligence

import time
from requests_oauthlib import OAuth1Session
import json
import datetime, time, sys
 
CK = ''         #Consumer Key''Enter in
CS = ''         #Consumer Secret''Enter in
AT = ''         #Access Token''Enter in
AS = ''         #Access Token Secret''Enter in
 
session = OAuth1Session(CK, CS, AT, AS)
 
url = 'https://api.twitter.com/1.1/search/tweets.json'
res = session.get(url, params = {'q':u'Artificial intelligence', 'count':100})
res_text = json.loads(res.text)
for tweet in res_text['statuses']:
    print ('-----')
    print (tweet['created_at'])
    print (tweet['text'])

Get account tweets

I will try to get the tweets of Nikkei Sangyo Shimbun.

import tweepy
import csv


consumer_key =      "" #“Consumer obtained with a personal account here_key""Please enter in "
consumer_secret  =  "" #“Consumer obtained with a personal account here_secret""Please enter in "
access_key =        "" #“Access obtained here with a personal account_key""Please enter in "
access_secret =     "" #“Access obtained here with a personal account_secret""Please enter in "

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

#Get Tweets
tweet_data = []

tweets = tweepy.Cursor(api.user_timeline,screen_name = "@nikkei_bizdaily",exclude_replies = True)
for tweet in tweets.items():
    tweet_data.append([tweet.id,tweet.created_at,tweet.text.replace('\n',''),tweet.favorite_count,tweet.retweet_count])
tweet_data

Save as csv data

#  tweets.Save as csv in data folder
with open('./6050_stock_price_prediction_data/tweets.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f, lineterminator='\n')
    writer.writerow(["id", "text", "created_at", "fav", "RT"])
    writer.writerows(tweet_data)

Sentiment analysis

Sentiment analysis 1 (morphological analysis)

Sentiment analysis uses natural language processing and the text has a positive meaning Or it is a technique to judge whether it has a negative meaning.

By sentiment analysis of product reviews It is widely used for marketing and customer support.

The main mechanism of sentiment analysis is the words that appear in the sentence Judge whether it has a positive, negative, or neutral meaning.

There is a polarity dictionary as a criterion for judgment It is defined in a dictionary in which positive or negative morphemes are defined in advance.

Sentiment analysis is performed by referring to the polarity dictionary for each word in the document. Let's first analyze the morpheme using MeCab.

import MeCab
import re
#Create a MeCab instance. If no argument is specified, it becomes an IPA dictionary.
m = MeCab.Tagger('')

#A function that morphologically parses text and returns a list of dictionaries
def get_diclist(text):
    parsed = m.parse(text)      #Morphological analysis result (obtained as a character string including line breaks)
    lines = parsed.split('\n')  #List the analysis results separately for each line (1 word)
    lines = lines[0:-2]         #The last two lines are unnecessary, so delete them
    diclist = []
    for word in lines:
        l = re.split('\t|,',word)  #Each line is separated by a tab and a comma
        d = {'Surface':l[0], 'POS1':l[1], 'POS2':l[2], 'BaseForm':l[7]}
        diclist.append(d)
    return(diclist)

It will be sunny tomorrow. Click here when set in the argument

import MeCab
import re
#Create a MeCab instance. If no argument is specified, it becomes an IPA dictionary.
m = MeCab.Tagger('')

#A function that morphologically parses text and returns a list of dictionaries
def get_diclist(text):
    parsed = m.parse(text)      #Morphological analysis result (obtained as a character string including line breaks)
    lines = parsed.split('\n')  #List the analysis results separately for each line (1 word)
    lines = lines[0:-2]         #The last two lines are unnecessary, so delete them
    diclist = []
    for word in lines:
        l = re.split('\t|,',word)  #Each line is separated by a tab and a comma
        d = {'Surface':l[0], 'POS1':l[1], 'POS2':l[2], 'BaseForm':l[7]}
        diclist.append(d)
    return(diclist)

get_diclist("It will be sunny tomorrow.")

image.png

Sentiment analysis 2 (polar dictionary)

This time, we will use the word-emotion polarity correspondence table as the polarity dictionary.

This assigns real numbers from -1 to +1 with reference to the "Iwanami Japanese Dictionary (Iwanami Shoten)".

The closer it is to -1, the more negative The closer it is to +1 the more positive it is.

Then read the polarity dictionary Create lists and dictionaries.

#word_list, pn_Store Word and PN in list type respectively.
import pandas as pd
pn_df = pd.read_csv('./6050_stock_price_prediction_data/pn_ja.csv', encoding='utf-8', names=('Word','Reading','POS', 'PN'))
word_list=list(pn_df['Word'])
pn_list=list(pn_df['PN'])

#pn_word as dict_list, pn_Create a dictionary that stores the list.
pn_dict = dict(zip(word_list,pn_list))

Sentiment analysis part 3 (PN value)

Implement where the PN value is returned by referring to the polarity dictionary.

Also Pass get_diclist ("It will be fine tomorrow") to the add_pnvalue function to see how it works We also pass it to the get_mean function to find the mean of the PN values.

import numpy as np


def add_pnvalue(diclist_old, pn_dict):
    diclist_new = []
    for word in diclist_old:
        base = word['BaseForm']        #Get uninflected words from individual dictionaries
        if base in pn_dict:
            pn = float(pn_dict[base]) 
        else:
            pn = 'notfound'            #If the word is not in the PN Table
        word['PN'] = pn
        diclist_new.append(word)
    return(diclist_new)

#Find the average PN for each tweet
def get_mean(dictlist):
    pn_list = []
    for word in dictlist:
        pn = word['PN']
        if pn!='notfound':
            pn_list.append(pn)
    if len(pn_list)>0:
        pnmean = np.mean(pn_list)
    else:
        pnmean=0
    return pnmean


dl_old = get_diclist("It will be sunny tomorrow.")
# get_diclist("It will be sunny tomorrow.")The function add_Pass it to pnvalue to see how it works.
dl_new = add_pnvalue(dl_old, pn_dict)
print(dl_new)

#Also function get it_Pass it to mean to find out the average of the PN values.
pnmean = get_mean(dl_new)
print(pnmean)

image.png

Sentiment analysis part 4 (graphing)

The change of PN value is displayed in a graph.

import matplotlib.pyplot as plt
%matplotlib inline
df_tweets = pd.read_csv('./6050_stock_price_prediction_data/tweets.csv', names=['id', 'date', 'text', 'fav', 'RT'], index_col='date')
df_tweets = df_tweets.drop('text', axis=0)
df_tweets.index = pd.to_datetime(df_tweets.index)
df_tweets = df_tweets[['text']].sort_index(ascending=True)

# means_Create an empty list called list and find the average value for each tweet.
means_list = []
for tweet in df_tweets['text']:
    dl_old = get_diclist(tweet)
    dl_new = add_pnvalue(dl_old, pn_dict)
    pnmean = get_mean(dl_new)
    means_list.append(pnmean)
df_tweets['pn'] = means_list
df_tweets =  df_tweets.resample('D', how='mean')

#Plot the date on the x-axis and the PN value on the y-axis.
x = df_tweets.index
y = df_tweets.pn
plt.plot(x,y)
plt.grid(True)

# df_tweets.df with the name csv_Please output tweets again.
df_tweets.to_csv('./6050_stock_price_prediction_data/df_tweets.csv')

image.png

Sentiment analysis part 5 (standardization)

Looking at the results of the graph, it seems that there are many negative values overall.

This is because the polar dictionary contains a lot of vocabulary with negative implications. Standardize to adjust for this result.

Standardize the PN value Also, change the PN to the average for each date and plot it.

# means_Standardize list, x_Output as std
df_tweets['pn'] = (df_tweets['pn'] - df_tweets['pn'].mean()) / df_tweets['pn'].std()

#Also, change the PN to the average for each date and plot it.
df_tweets =  df_tweets.resample('D', how='mean')
x = df_tweets.index
y = df_tweets.pn
plt.plot(x,y)
plt.grid(True)

image.png

Recommended Posts

Python: Stock Price Forecast Part 2
Python: Stock Price Forecast Part 1
[Python] My stock price forecast [HFT]
Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~
Stock Price Forecast 2 Chapter 2
Stock Price Forecast 1 Chapter 1
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
Stock price forecast with tensorflow
Get stock price with Python
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~
QGIS + Python Part 2
QGIS + Python Part 1
Stock price forecast using deep learning (TensorFlow)
Download Japanese stock price data with python
[Python] Creating a stock price drawdown chart
Python: Scraping Part 1
Stock price forecast using machine learning (regression)
Python3 Beginning Part 1
Python: Scraping Part 2
Get stock price data with Quandl API [Python]
Let's do web scraping with Python (stock price)
Stock price forecast using deep learning [Data acquisition]
Python basic memorandum part 2
Python basic memo --Part 2
Cryptocurrency price fluctuation forecast
Kaggle ~ House Price Forecast ② ~
Python basic memo --Part 1
Kaggle ~ Home Price Forecast ~
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
Image processing with Python (Part 2)
Bordering images with python Part 1
Python application: Pandas Part 1: Basic
Python application: Pandas Part 2: Series
Scraping with Selenium + Python Part 1
Python: Ship Survival Prediction Part 2
Programming history 1 month Extract NY Dow stock price with Python!
Python: Supervised Learning: Hyperparameters Part 1
Bitcoin price monitor python script
[Introduction to Systre] Stock price forecast; Monday is weak m (__) m
Python Basic Grammar Memo (Part 1)
Python: Ship Survival Prediction Part 1
Studying Python with freeCodeCamp part2
Image processing with Python (Part 1)
Solving Sudoku with Python (Part 2)
Time series analysis Part 3 Forecast
Image processing with Python (Part 3)
Python: Ship Survival Prediction Part 3
UI Automation Part 2 in Python
Scraping weather forecast with python
Python: Supervised Learning: Hyperparameters Part 2
Stock Price Forecasting Using LSTM_1
Stock price data acquisition tips
Basics of Python × GIS (Part 1)
Stock price forecast by machine learning Let's get started Numerai
Stock price forecast by machine learning is so true Numerai Signals
Get US stock price from Python with Web API with Raspberry Pi
Transpose CSV files in Python Part 1
Basics of Python x GIS (Part 3)
Playing handwritten numbers with python Part 1
perl objects and python class part 2.