[PYTHON] I tried to predict the up and down of the closing price of Gurunavi's stock price using TensorFlow (progress)

Introduction

-Last time, I'm back again. I'm crazy, but I've remade it quite a bit. ――I have no basic knowledge of deep learning. Please note that I made it only with the desire to move it. ――The development period is about one week. --Python I don't know. This is my first time with TensorFlow. That is the level. --I have no knowledge of stocks. ――The accuracy is not high at all. ――Before the accuracy, I didn't understand how to make the contents, and I remade it over and over again, and once I was exhausted, I only went to study. -GourNavi. ――Since it is in the middle of the process, it may not be fully understood (wrong). ――After you've done it properly, you may organize the pages.

Plagiarism

-TensorFlow predicts whether the closing price of the Nikkei average will rise or fall from the previous day (1)

Description

Predict whether the closing price of Gurunavi's stock price has risen or fallen from the previous day.

Environment

Last time is used.

In addition to this, matplotlib is included to display the graph (it has nothing to do with the actual prediction).

pip install matplotlib

Data collection

The source of the stock price data was obtained from the following. Stock investment memo / stock price database

In addition, we divided the files into 3 files for each brand and combined the files divided for each year. Finally, I adjusted the header column.

It seems that the file cannot be posted, so the following is an example.

Example) Gurunavi.csv image.png

result

Source code

stock_price_prediction.py


#!/usr/local/bin/python
#! -*- coding: utf-8 -*-

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

flags = tf.app.flags
FLAGS = flags.FLAGS

#Folder where CSV data is placed
flags.DEFINE_string('csv_dir', '{Directory path}', 'Directory to put the csv data.')
#Folder where learning data is placed
flags.DEFINE_string('train_dir', '{Directory path}', 'Directory to put the training data.')
#Closing column name
flags.DEFINE_string('close_column', 'Close', 'Close column name.')

#AI learning model part(neural network)To create
def inference(num_predictors, num_classes, stock_placeholder):

    #Weight with standard deviation 0.Initialize with a normal distribution of 1
    def weight_variable(shape):
        initial = tf.truncated_normal(shape, stddev=0.0001)

        return tf.Variable(initial)

    #Bias standard deviation 0.Initialize with a normal distribution of 1
    def bias_variable(shape):
        initial = tf.ones(shape)

        return tf.Variable(initial)

    with tf.name_scope('fc1') as scope:
        weights = weight_variable([num_predictors, num_classes])
        biases = bias_variable([num_classes])

    #Normalization with softmax function
    #Convert the output of the neural network so far to the probability of each label
    with tf.name_scope('softmax') as scope:
        model = tf.nn.softmax(tf.matmul(stock_placeholder, weights) + biases)

    #Probability of each label(Something like?)return it
    return model

#Calculate how much "error" there was between the prediction result and the correct answer
def loss(logits, labels):

    #Calculation of cross entropy
    cross_entropy = -tf.reduce_sum(labels*tf.log(logits))

    #Specify to display in TensorBoard
    tf.summary.scalar("cross_entropy", cross_entropy)

    #Error rate value(cross_entropy)return it
    return cross_entropy

#error(loss)Train a learning model designed using error backpropagation based on
def training(labels_placeholder, model):

    #Like this function does all that
    cost = -tf.reduce_sum(labels_placeholder*tf.log(model))
    training_step = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cost)

    return training_step

#Calculate the correct answer rate of the prediction result given by the learning model at inference
def accuracy(model, labels_placeholder):

    #Get an array when the predicted (model) and the actual value (actual) match (equal)
    #Example result: [1,1,0,1,0]1 is the correct answer
    correct_prediction = tf.equal(
        tf.argmax(model, 1),
        tf.argmax(labels_placeholder, 1)
    )

    #Result (eg)[1,1,0,1,0]1 is correct) cast to float
    #All averages (reduce)_get mean)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    #Set to display on TensorBoard
    tf.summary.scalar("accuracy", accuracy)

    return accuracy

if __name__ == '__main__':
    stocks = [
        'Gurunavi',
        'Recruit',
        'Kakakucom',
    ]

    closing_data = pd.DataFrame()

    for stock in stocks:
        #Load CSV into Pandas Dataframe.
        data = pd.read_csv(FLAGS.csv_dir + stock + '.csv', index_col='Date').sort_index()

        #Normalize the closing price by taking the logarithm of the ratio from one day ago.
        #If it is higher than the previous day, it will be positive, and if it is lower, it will be negative.
        closing_data[stock] = np.log(data[FLAGS.close_column] / data[FLAGS.close_column].shift())

    #Exclude the missing value line.
    closing_data = closing_data.dropna()

    #Converts a logarithmic return to a flag.
    #If positive
    closing_data["Gurunavi_Positive"] = 0
    closing_data.ix[closing_data["Gurunavi"] >= 0, "Gurunavi_Positive"] = 1
    #If negative
    closing_data["Gurunavi_Negative"] = 0
    closing_data.ix[closing_data["Gurunavi"] < 0, "Gurunavi_Negative"] = 1

    training_data = pd.DataFrame(
        # column name is "<index>_<day>".
        columns= ["Gurunavi_Positive", "Gurunavi_Negative"] + [s + "_1" for s in stocks[1:]]
    )

    for i in range(7, len(closing_data)):
        data = {}

        # We will use today's data for positive/negative labels
        data["Gurunavi_Positive"] = closing_data["Gurunavi_Positive"].ix[i]
        data["Gurunavi_Negative"] = closing_data["Gurunavi_Negative"].ix[i]

        # Use yesterday's data for world market data
        for col in stocks[1:]:
            data[col + "_1"] = closing_data[col].ix[i - 1]

        training_data = training_data.append(data, ignore_index=True)

    #Data used to make predictions
    # Gurunavi_Positive, Gurunavi_Negative
    predictors_tf = training_data[training_data.columns[2:]]
    #Correct answer data
    classes_tf = training_data[training_data.columns[:2]]

    #Split training and test data.
    training_set_size = int(len(training_data) * 0.8)
    test_set_size = len(training_data) - training_set_size

    training_predictors_tf = predictors_tf[:training_set_size]
    training_classes_tf = classes_tf[:training_set_size]
    test_predictors_tf = predictors_tf[training_set_size:]
    test_classes_tf = classes_tf[training_set_size:]

    #Define variables for the number of predictors and the number of classes to remove the magic number from your code.
    num_predictors = len(training_predictors_tf.columns)
    num_classes = len(training_classes_tf.columns)

    #Specify the scope to be output to the graph of TensorBoard
    with tf.Graph().as_default():
        #
        stock_placeholder = tf.placeholder("float", [None, num_predictors])
        #
        labels_placeholder = tf.placeholder("float", [None, num_classes])

        # inference()To create a model
        model = inference(num_predictors, num_classes, stock_placeholder)
        # loss()To calculate the loss
        loss_value = loss(model, labels_placeholder)
        # training()To train and adjust the parameters of the learning model
        training_step = training(labels_placeholder, model)
        #Accuracy calculation
        accuracy = accuracy(model, labels_placeholder)

        #Ready to save
        saver = tf.train.Saver()
        #Creating a Session(TensorFlow calculations must be done in an absolute Session)
        sess = tf.Session()
        #Variable initialization(Initialize when starting Session)
        sess.run(tf.global_variables_initializer())
        #TensorBoard display settings(Tensor Board Declarative?)
        summary_op = tf.summary.merge_all()
        # train_Specify the path to output the TensorBoard log with dir
        summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)

        for step in range(1, 10000):
            sess.run(
                training_step,
                feed_dict={
                    stock_placeholder: training_predictors_tf.values,
                    labels_placeholder: training_classes_tf.values.reshape(len(training_classes_tf.values), 2)
                }
            )

            if step % 100 == 0:

                train_accuracy = sess.run(
                    accuracy,
                    feed_dict={
                        stock_placeholder: training_predictors_tf.values,
                        labels_placeholder: training_classes_tf.values.reshape(len(training_classes_tf.values), 2)
                    }
                )

                print "step %d, training accuracy %g"%(step, train_accuracy)

                #Add a value to be displayed on the TensorBoard after each step
                summary_str = sess.run(
                    summary_op,
                    feed_dict={
                        stock_placeholder: training_predictors_tf.values,
                        labels_placeholder: training_classes_tf.values.reshape(len(training_classes_tf.values), 2)
                    }
                )
                summary_writer.add_summary(summary_str, step)

        #Display accuracy for test data after training
        print "test accuracy %g"%sess.run(
            accuracy,
            feed_dict={
                stock_placeholder: test_predictors_tf.values,
                labels_placeholder: test_classes_tf.values.reshape(len(test_classes_tf.values), 2)
            }
        )

Execution result

python stock_price_prediction.py                                                                        
2017-06-06 15:26:17.251792: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-06 15:26:17.251816: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-06 15:26:17.251824: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-06 15:26:17.251831: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-06 15:26:17.251838: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
step 100, training accuracy 0.504931
step 200, training accuracy 0.504931
step 300, training accuracy 0.504931
step 400, training accuracy 0.504931
step 500, training accuracy 0.504931
step 600, training accuracy 0.504931
step 700, training accuracy 0.502959
step 800, training accuracy 0.504931
step 900, training accuracy 0.510848
step 1000, training accuracy 0.510848
step 1100, training accuracy 0.510848
step 1200, training accuracy 0.512821
step 1300, training accuracy 0.510848
step 1400, training accuracy 0.508876
step 1500, training accuracy 0.510848
step 1600, training accuracy 0.510848
step 1700, training accuracy 0.510848
step 1800, training accuracy 0.512821
step 1900, training accuracy 0.510848
step 2000, training accuracy 0.508876
step 2100, training accuracy 0.502959
step 2200, training accuracy 0.499014
step 2300, training accuracy 0.500986
step 2400, training accuracy 0.502959
step 2500, training accuracy 0.504931
step 2600, training accuracy 0.506903
step 2700, training accuracy 0.506903
step 2800, training accuracy 0.514793
step 2900, training accuracy 0.512821
step 3000, training accuracy 0.508876
step 3100, training accuracy 0.504931
step 3200, training accuracy 0.508876
step 3300, training accuracy 0.506903
step 3400, training accuracy 0.510848
step 3500, training accuracy 0.510848
step 3600, training accuracy 0.512821
step 3700, training accuracy 0.512821
step 3800, training accuracy 0.508876
step 3900, training accuracy 0.510848
step 4000, training accuracy 0.512821
step 4100, training accuracy 0.510848
step 4200, training accuracy 0.510848
step 4300, training accuracy 0.512821
step 4400, training accuracy 0.512821
step 4500, training accuracy 0.512821
step 4600, training accuracy 0.512821
step 4700, training accuracy 0.514793
step 4800, training accuracy 0.512821
step 4900, training accuracy 0.512821
step 5000, training accuracy 0.514793
step 5100, training accuracy 0.514793
step 5200, training accuracy 0.514793
step 5300, training accuracy 0.512821
step 5400, training accuracy 0.514793
step 5500, training accuracy 0.514793
step 5600, training accuracy 0.518738
step 5700, training accuracy 0.516765
step 5800, training accuracy 0.518738
step 5900, training accuracy 0.518738
step 6000, training accuracy 0.516765
step 6100, training accuracy 0.514793
step 6200, training accuracy 0.518738
step 6300, training accuracy 0.52071
step 6400, training accuracy 0.518738
step 6500, training accuracy 0.52071
step 6600, training accuracy 0.522682
step 6700, training accuracy 0.522682
step 6800, training accuracy 0.522682
step 6900, training accuracy 0.52071
step 7000, training accuracy 0.52071
step 7100, training accuracy 0.518738
step 7200, training accuracy 0.514793
step 7300, training accuracy 0.516765
step 7400, training accuracy 0.516765
step 7500, training accuracy 0.514793
step 7600, training accuracy 0.512821
step 7700, training accuracy 0.512821
step 7800, training accuracy 0.514793
step 7900, training accuracy 0.514793
step 8000, training accuracy 0.518738
step 8100, training accuracy 0.516765
step 8200, training accuracy 0.516765
step 8300, training accuracy 0.514793
step 8400, training accuracy 0.516765
step 8500, training accuracy 0.518738
step 8600, training accuracy 0.516765
step 8700, training accuracy 0.516765
step 8800, training accuracy 0.516765
step 8900, training accuracy 0.516765
step 9000, training accuracy 0.516765
step 9100, training accuracy 0.516765
step 9200, training accuracy 0.516765
step 9300, training accuracy 0.516765
step 9400, training accuracy 0.516765
step 9500, training accuracy 0.514793
step 9600, training accuracy 0.512821
step 9700, training accuracy 0.512821
step 9800, training accuracy 0.512821
step 9900, training accuracy 0.512821
test accuracy 0.464567

TensorBoard

I wanted to put it out, but I can't put it up because it doesn't seem to be in a normal state.

Supplement

--Accuracy 0.464567 means that it is about 40% that it goes up or down, so it is completely unreliable (?) ――Before you reach this point, learning was a fixed number, but it has finally changed. ――In order to improve accuracy, there is little data, adjustment is necessary, there are few learning elements, or the understanding is wrong.

When you want to put out a graph

# pd.DataFrame()Makes a line graph.
data.plot(figsize = (10, 5), linewidth = 0.5)
#Draw a graph
plt.show()

Recommended Posts

I tried to predict the up and down of the closing price of Gurunavi's stock price using TensorFlow (progress)
I tried to predict the price of ETF
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to extract and illustrate the stage of the story using COTOHA
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried to find the average of the sequence with TensorFlow
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition
I tried to get the index of the list using the enumerate function
Predict the rise and fall of BTC price using Qore SDK
I became horror when I tried to detect the features of anime faces using PCA and NMF.
I tried to classify text using TensorFlow
I tried to predict Covid-19 using Darts
I tried to get the batting results of Hachinai using image processing
I tried to visualize the age group and rate distribution of Atcoder
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to verify and analyze the acceleration of Python by Cython
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to verify how fast the mnist of Chainer example can be speeded up using cython
I tried to deliver mail from Node.js and Python using the mail delivery service (SendGrid) of IBM Cloud!
I tried to touch the API of ebay
I tried using the image filter of OpenCV
I tried to make a ○ ✕ game using TensorFlow
I tried to vectorize the lyrics of Hinatazaka46!
[Python] I tried to judge the member image of the idol group using Keras
I tried to predict the presence or absence of snow by machine learning.
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
python beginners tried to predict the number of criminals
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to summarize the basic form of GPLVM
I tried the MNIST tutorial for beginners of tensorflow.
I tried to predict the J-League match (data analysis)
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
I tried to extract the text in the image file using Tesseract of the OCR engine
I tried to approximate the sin function using chainer
I tried using the API of the salmon data project
I tried to visualize the spacha information of VTuber
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to erase the negative part of Meros
I tried to implement Grad-CAM with keras and tensorflow
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to classify the voices of voice actors
I tried to compress the image using machine learning
I tried to summarize the string operations of Python
I tried the python version of "Consideration of Conner Davis's answer" Printing numbers from 1 to 100 without using loops, recursion, and goto "
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
[For those who want to use TPU] I tried using the Tensorflow Object Detection API 2
I tried to predict the genre of music from the song title on the Recurrent Neural Network
Implementation of recommendation system ~ I tried to find the similarity from the outline of the movie using TF-IDF ~
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API