[PYTHON] Stock price forecast using machine learning (scikit-learn)

I studied scikit-learn, so I used it to predict the stock price. Last time, Last time I want to compare with TensorFlow This is because it is troublesome to obtain and process the data used for input. Please forgive me. By the way, there are already people who are doing the same thing. Since I only studied scikit-learn (and the theory around it) for about a week, I think there are many mistakes. We are waiting for your suggestions.

What is scikit-learn?

It seems to read "Sykit Learn". A library for machine learning. It is equipped with various algorithms and is relatively easy to use. You may be able to do the same with TensorFlow, but scikit-learn is easier to write.

merit

--Various algorithms can be used. --It works on Windows. (This is important)

Demerit

――Deep learning is not possible.

Effect

--Try scikit-learn. --See usability, accuracy, speed, etc. compared to when using TensorFlow.

things to do

"Use several days' worth of global stock indexes (Dow, Nikkei 225, DAX, etc.) to predict whether the Nikkei 225 will rise or fall the next day (2 choices)" (same as last time)

environment

scikit-learn 0.17.1 Python 2.7 Windows 7

Implementation

Preparation

The previous data will be used as it is. (The Nikkei, Dow, Hang Seng Index, and German stock indexes downloaded from the site Quandl are combined into one text data)

label

In the case of scikit-learn, the label seems to specify the numerical value with int instead of the flag format (like [0,0,1]), so it was set to 0 for rising and 1 for falling.

if array_base[i][3] > (array_base[i+1][3]):
    y_flg_array.append(0)
    up += 1
else:
    y_flg_array.append(1)
    down += 1

As a whole sample Go up: 50.5% Down: 49.5% have become.

Input data

Based on the previous improvement points, instead of putting the stock price as it is, we give a list of "how much (%) it went up or down compared to the previous day".

tmp_array = []
for j in xrange(i+1, i + data_num + 1):
    for k in range(16):
        tmp_array.append((array_base[j][k] - array_base[j+1][k]) / array_base[j][k] * 100)
x_array.append(tmp_array)

Classification algorithm

Various algorithms can be used with scikit-learn, but honestly I'm not sure which one is better, so I decided to try about three such ones. This time, we will try three methods: stochastic gradient descent, decision tree, and support vector machine. By the way, I have no idea how these three are different. (^ _ ^;)

# SGDClassifier
clf = linear_model.SGDClassifier()
testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array)

# Decision Tree
clf = tree.DecisionTreeClassifier()
testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array)

# SVM
clf = svm.SVC()
testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array)

Training, evaluation

I tried to train and evaluate in the function. Training is just doing fit () and evaluation is doing score (), so it's very easy.

def testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array):

    print clf
    clf.fit(x_train_array, y_flg_train_array)
    print clf.score(x_test_array, y_flg_test_array)

Result-Part 1-

SGDClassifier : 0.56591099916
DecisionTreeClassifier : 0.544080604534
SVM : 0.612090680101

When using TensorFlow, the correct answer rate was about 63%, so it seems that some results are coming out, though not so much. Processing is heavy only for SVM.

Parameter adjustment

In the above, when creating an instance of each classifier, nothing was specified in the argument, but it seems that the accuracy can be improved by adjusting the parameters. In addition, there is also the ability to brute force this parameter. Convenient. Try it with the SVM that gave the best results.

clf = svm.SVC()
grid = grid_search.GridSearchCV(estimator=clf, param_grid={'kernel': ['rbf','linear','poly','sigmoid']})
grid.fit(x_train_array, y_flg_train_array)
testClf(grid.best_estimator_, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array)

In the above, we have tried the SVM kernel with four,'rbf','linear','poly', and'sigmoid', and trained and tested again with the best parameters. (Is training unnecessary anymore?) As an aside, of course, I don't really understand the meaning of kernel. (^ _ ^;)

Result-Part 2-

0.638958858102

The best results were obtained when the kernel was linear, with a slight increase in accuracy. Approximately 64% ... I've exceeded deep learning ... (maybe within the margin of error)

Consideration

――After all, it is better to input the rate of change rather than entering the stock price as it is. (I tried it with the stock price as it was, but it didn't work) ――Deep learning is very popular, but you can do your best in other areas as well.

Impressions

――It's fun to move it relatively easily even if you don't understand the algorithm at all. --Grid search (a function to brute force parameters) takes some time. If you want to try multiple parameters, you need to be prepared for specs. (Is this the story of "curse of dimensionality"?) ――It doesn't matter, but I used Eclipse for this development (until now it was a text editor). It's super easy. --There is too little Japanese information on scikit-learn. Can someone translate the official tutorials into Japanese ...

Referenced site

-Official Tutorial -Official API Reference -Predict the future with machine learning --Predict the future stock price with the decision tree of scikit-learn

Recommended Posts

Stock price forecast using machine learning (scikit-learn)
Stock price forecast using machine learning (regression)
Stock price forecast using deep learning (TensorFlow)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Stock price forecast by machine learning Numerai Signals
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
Stock price forecast using deep learning [Data acquisition]
Stock price forecast by machine learning Let's get started Numerai
[Machine learning] LDA topic classification using scikit-learn
Predicting stock price changes using metal labeling and two-step machine learning
Stock price forecast by machine learning is so true Numerai Signals
Stock Price Forecast 2 Chapter 2
Stock Price Forecast 1 Chapter 1
Stock price forecast with tensorflow
Python: Stock Price Forecast Part 2
Stock Price Forecasting Using LSTM_1
Python: Stock Price Forecast Part 1
[Python] My stock price forecast [HFT]
Try machine learning with scikit-learn SVM
100 language processing knock-73 (using scikit-learn): learning
Application development using Azure Machine Learning
Machine learning
scikit-learn How to use summary (machine learning)
[Machine learning] FX prediction using decision trees
[Machine learning] Supervised learning using kernel density estimation
Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~
[Machine learning] Regression analysis using scikit learn
A story about simple machine learning using TensorFlow
Data supply tricks using deques in machine learning
Try to forecast power demand by machine learning
[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
Face image dataset sorting using machine learning model (# 3)
Overview of machine learning techniques learned from scikit-learn
Is it possible to eat stock price forecasts by machine learning [Implementation plan]
[Python3] Let's analyze data using machine learning! (Regression)
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Reasonable price estimation of Mercari by machine learning
[Memo] Machine learning
Machine learning classification
Try using Jupyter Notebook of Azure Machine Learning
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Machine Learning sample
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
What I learned about AI / machine learning using Python (1)
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~
Create machine learning projects at explosive speed using templates
[Machine learning] Understanding SVM from both scikit-learn and mathematics
Easy machine learning with scikit-learn and flask ✕ Web app
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
What I learned about AI / machine learning using Python (3)
Machine Learning with Caffe -1-Category images using reference model
Tech-Circle Let's start application development using machine learning (self-study)
[Machine learning] Try to detect objects using Selective Search
[Machine learning] Text classification using Transformer model (Attention-based classifier)
Practical machine learning with Scikit-Learn and TensorFlow-TensorFlow gave up-
Memo for building a machine learning environment using Python
What I learned about AI / machine learning using Python (2)
I tried to compress the image using machine learning