[PYTHON] Stock price forecast using machine learning (regression)

Previously (1, 2, [3](http://qiita. Continuation from com / yai / items / 1b7f8ef69f8f2343e3e9)). It's not fun to just classify, so I'd like to try a regression problem. So, this time, I used the machine learning framework TensorFlow and scikit-learn to directly predict the stock price. Previously, the stock price on the next day was classified as "up or down", but this time we will directly predict "how many yen". The data used for input is reused. Sumimasen every time.

Effect

--Try to predict the stock price using TensorFlow or scikit-learn. --Check the accuracy and usability.

things to do

"Use several days' worth of global stock indexes (Dow, Nikkei 225, DAX, etc.) to predict the next day's Nikkei 225" (regression)

environment

scikit-learn scikit-learn 0.17.1 Python 2.7 Windows 7 TensorFlow TensorFlow 0.7 Ubuntu 14.04 Python 2.7 AWS EC2 micro instance

Implementation

Download the Nikkei, Dow, Hang Seng and German stock indexes from the site Quandl. Combine them into one as text data. (Manual work)

label

Use the closing price of the next day as the correct answer data. (Predict the closing price of the next day) However, when I tried to put the stock price directly as a label, the result was divergent and divergent, so I devised it. We will make the prediction target indirect, such as "what percentage of the closing price of the next day will be higher or lower than the previous day", and then recalculate to find the closing price. The same rate of change is used for the inputs described below, so it makes sense ... (・ ・;)

# JUDGE_DAY = 1,Second subscript[3]Contains the closing price of the Nikkei 225.
y_array.append([(array_base[i][3] - array_base[i+JUDGE_DAY][3]) / array_base[i][3] * 100])

Input data

Instead of putting the stock price as it is, we give a list of "how much (%) it went up or down compared to the previous day". (Because it didn't work at all even if I put the stock price as it was)

tmp_array = []
for j in xrange(i+1, i + data_num + 1):
    for k in range(16):
        tmp_array.append((array_base[j][k] - array_base[j+1][k]) / array_base[j][k] * 100)
x_array.append(tmp_array)

A story unique to TensorFlow

Graph

TensorFlow has two hidden layers and the number of units is 50 and 25, respectively.

NUM_HIDDEN1 = 50
NUM_HIDDEN2 = 25
def inference(x_ph, keep_prob):

  with tf.name_scope('hidden1'):
    weights = tf.Variable(tf.truncated_normal([data_num * price_num, NUM_HIDDEN1], stddev=stddev), name='weights')
    biases = tf.Variable(tf.zeros([NUM_HIDDEN1]), name='biases')
    hidden1 = tf.nn.relu(tf.matmul(x_ph, weights) + biases)
  
  with tf.name_scope('hidden2'):
    weights = tf.Variable(tf.truncated_normal([NUM_HIDDEN1, NUM_HIDDEN2], stddev=stddev), name='weights')
    biases = tf.Variable(tf.zeros([NUM_HIDDEN2]), name='biases')
    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
  
  #DropOut
  dropout = tf.nn.dropout(hidden2, keep_prob)
  
  with tf.name_scope('regression'):
    weights = tf.Variable(tf.truncated_normal([NUM_HIDDEN2, 1], stddev=stddev), name='weights')
    biases = tf.Variable(tf.zeros([1]), name='biases')
    y = tf.matmul(dropout, weights) + biases
  
  return y

loss

Use l2_loss () to calculate the loss. I'm wondering if this is good because the difference between the numbers is a loss, but I'm not sure if it's correct. Those who say "No," are welcome to comment.

def loss(y, target):

  return tf.reduce_mean(tf.nn.l2_loss((y - target)))

optimisation

Isn't there anything special to mention?

def optimize(loss):
  optimizer = tf.train.AdamOptimizer(learning_rate)
  train_step = optimizer.minimize(loss)
  return train_step

Training

Isn't there anything special to mention here as well?

def training(sess, train_step, loss, x_train_array, y_flg_train_array):
  
  summary_op = tf.merge_all_summaries()
  init = tf.initialize_all_variables()
  sess.run(init)
  
  summary_writer = tf.train.SummaryWriter(LOG_DIR, graph_def=sess.graph_def)
  
  for i in range(int(len(x_train_array) / bach_size)):
    batch_xs = getBachArray(x_train_array, i * bach_size, bach_size)
    batch_ys = getBachArray(y_flg_train_array, i * bach_size, bach_size)
    sess.run(train_step, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 0.8})
    ce = sess.run(loss, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 1.0})

    summary_str = sess.run(summary_op, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 1.0})
    summary_writer.add_summary(summary_str, i)

Evaluation

For accuracy, the difference between the calculated fluctuation rate of the stock price and the actual fluctuation rate is calculated, and the average of the absolute values is output. In other words, it just gives the average error.

accuracy = tf.reduce_mean(tf.abs(y - y_ph))
print "accuracy"
print(sess.run(accuracy, feed_dict={x_ph: test_batch_xs, y_ph: test_batch_ys, keep_prob: 1.0}))

A story unique to scikit-learn

Regression algorithm

There are various algorithms, but ... I'm not sure which one is the best, so I'll pick up about three and use them without arguments.

# SGDRegressor
clf = linear_model.SGDRegressor()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)

# DecisionTreeRegressor
clf = tree.DecisionTreeRegressor()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)

# SVM
clf = svm.SVR()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)

Training, evaluation

Training only executes fit (). When I executed score (), the evaluation became partially negative, and I was not sure how to judge it (please tell me if you know it), so as with TensorFlow, the predicted stock price change rate Is taken out and the absolute value of the difference in the actual stock price change rate is averaged (in short, the average error).

def testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array):

    print clf
    clf.fit(x_train_array, y_flg_train_array)
    result = clf.predict(x_test_array)
    print clf.score(x_test_array, y_flg_test_array)
    print np.mean(np.abs(np.array(result) - np.array(y_flg_test_array)))

result

TensorFlow

1.00044

scikit-learn

SGDRegressor: 0.943171296872
DecisionTreeRegressor: 1.3551351662
SVM: 0.945361479916

Therefore, the error was about 1%. A 1% error in the stock price forecast ... can't be used at all ... Gefun Gefun.

Actually expected

Since it's a big deal, let's actually get the predicted value. Since the data at hand is up to 2016/03/24, we predict the closing price of the Nikkei Stock Average on 2016/03/25. scikit-learn uses SVM.

TensorFlow


p = sess.run(y, feed_dict={x_ph: data, keep_prob: 1.0})
price = ((p[0][0] / 100.) + 1.) * 16892.33
print price

scikit-learn


p = clf.predict(data)
price = ((p[0] / 100.) + 1.) * 16892.33
print price

result

TensorFlow


16804.3398821

scikit-learn


16822.6013292

So, the actual stock price on 3/25 is ... 17,002.75. ** ... Well, that's right. ** **

Consideration

――It seems that machine learning does not mean that "why do you train with it and the machines will do their best to come up with the best ones". It seems that humans also need to think about what to do to make machines easier to think and answer.

Impressions

--Regression problem is fun. ――It's easy to find an error because it doesn't give an unmotivated answer such as "all first categories" like classification. ――The code is a little different from the time of classification, but development is easy because you can use 80% of the chords.

Recommended Posts

Stock price forecast using machine learning (regression)
Stock price forecast using machine learning (scikit-learn)
Stock price forecast using deep learning (TensorFlow)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Stock price forecast by machine learning Numerai Signals
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
Stock price forecast using deep learning [Data acquisition]
Stock price forecast by machine learning Let's get started Numerai
[Machine learning] Regression analysis using scikit learn
Predicting stock price changes using metal labeling and two-step machine learning
Stock price forecast by machine learning is so true Numerai Signals
Machine learning logistic regression
Machine learning linear regression
Stock Price Forecast 2 Chapter 2
Stock Price Forecast 1 Chapter 1
[Python3] Let's analyze data using machine learning! (Regression)
Understand machine learning ~ ridge regression ~.
Python: Stock Price Forecast Part 2
Stock Price Forecasting Using LSTM_1
Supervised machine learning (classification / regression)
Machine learning stacking template (regression)
Machine learning algorithm (logistic regression)
Python: Stock Price Forecast Part 1
Machine learning beginners try linear regression
Machine learning algorithm (multiple regression analysis)
[Python] My stock price forecast [HFT]
Machine learning algorithm (simple regression analysis)
Classification and regression in machine learning
Application development using Azure Machine Learning
Machine learning
<Course> Machine Learning Chapter 3: Logistic Regression Model
Machine learning algorithm (generalization of linear regression)
[Machine learning] LDA topic classification using scikit-learn
[Machine learning] FX prediction using decision trees
<Course> Machine Learning Chapter 1: Linear Regression Model
Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Machine learning algorithm (linear regression summary & regularization)
A story about simple machine learning using TensorFlow
Data supply tricks using deques in machine learning
Try to forecast power demand by machine learning
[Machine learning] Supervised learning using kernel density estimation Part 2
EV3 x Python Machine Learning Part 2 Linear Regression
[Machine learning] Supervised learning using kernel density estimation Part 3
Face image dataset sorting using machine learning model (# 3)
Is it possible to eat stock price forecasts by machine learning [Implementation plan]
Reasonable price estimation of Mercari by machine learning
[Memo] Machine learning
Machine learning classification
Try using Jupyter Notebook of Azure Machine Learning
Machine Learning sample
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~
Create machine learning projects at explosive speed using templates
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
What I learned about AI / machine learning using Python (3)
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Tech-Circle Let's start application development using machine learning (self-study)
[Machine learning] Try to detect objects using Selective Search
[Machine learning] Text classification using Transformer model (Attention-based classifier)