I resumed learning Python on December 3rd last year. When I was wondering if I could make something for practice, I came up with the idea of making an AI for stock price forecasting, so I decided to try it. The result is an AI that can (and may) predict that stock prices will rise or fall by more than 1% with 64% accuracy. Project 1 is the main subject of the process, and Project 2 is the main subject.

Project 1

Use linear regression to visually determine if the current stock market is a bubble. I also wanted to embed the results on a website so that anyone could generate images of their favorite ETFs, company stocks, or any period of time to help determine if the current price is a bubble.

As a result, I thought I was successful with Django in my local environment, but I couldn't actually deploy it. I think I didn't have enough knowledge about databases and Django. It is a code for the time being. https://github.com/LulutasoAI/LRQiita/blob/master/LRgenerator

Project 2

Predict future stock prices by image analysis.

Attempt 1

A large amount of stock price data is converted into a candlestick chart image every two months, the price and date and time displayed at the end of the chart are taken, and it is divided into two types according to whether the price is rising two months later. Predict whether stock prices will go up or down in the same way you distinguish between dogs and cats. The data is about 500 Nikkei Stock Average, S & P500, APPLE inc. And Toyota Motor Corporation from 2001 to the present. https://github.com/LulutasoAI/Datasetpricechart1 Please for learning

The data looks like this Category 0: Price goes up after image Category 1: Price drops after image

result

The accuracy did not change with the validation of learning, but the accuracy is 58% for the time being 55.8% when tried with a test set Hmm subtle

Attempt 2

I tried to predict by classifying the candlestick chart into four categories. There are four groups in the dataset: the group whose price increased by 4.8% or more in the two months after the image, the group whose price decreased by 4.8% or more, the group whose price increased slightly (4.7% or less) in other images, and the group whose price decreased. I divided it into learning. The result is an AI that can guess which group the image belongs to with 36% accuracy. I thought about the meaning of this accuracy of 36%, but I didn't understand it well. I thought that it was much better than the 25% that can be randomly predicted because it is simply 4 categories, but the ratio of the 4 categories of the dataset used this time is "down 13.36%", "drastically-down 23.22%" , "drastically-up 38.23%", "up 25.17%", so it doesn't seem to be so easy to think about. For example, as an obvious example of overfitting, an AI that predicts it is "drastically up" for every image can have an accuracy of around 38% (random training and test sets from this dataset). Because it can be done). Anyway, I wasn't sure if 36% was good or bad.

Attempt 3

I wanted to make it simpler. We decided to train using only two groups in the above dataset whose prices moved by 4.8% or more. as a result Overfitting has occurred in this way. I think there is still room for research in this attempt. It seems that improvement can be expected by playing with keras models and layers.

Attempt 4

I simply create a dataset by dividing it into two depending on whether the price has risen or fallen in the two months after the image, but this time I tried to learn using a large amount of data. We have acquired stock price data for the period from 2001-01-01 to the present.

^N225 nikkei 225
AAPL Apple Inc.
^GSPC S%P 500
TM TOYOTA
GOOGL Alphabet Inc Class A
BA Boeing Company
AMZN Amazon.com Inc.
PYPL PayPal
TWTR Twitter, Inc.
V Visa Inc.

The result was pretty good.

Attempt 5

With the above taxonomy, it is possible that something that might have been classified as "down" by a small difference (for example, a difference of 2 cents) will be classified as "up", and the price will be small. I thought that things that didn't change might confuse the model during training, so I decided not to include things that had price fluctuations of less than 1% in the dataset. The result of learning with that dataset

I realized here that the problem might be with the keras model or layer rather than the dataset.

Attempt 6

I tried to make use of everything I learned. The data set is large, and 3753 candlestick charts excluding images whose price moved less than 1% 2001- I was conscious of avoiding overfitting.

And the result of trying with a certain model After 150 epochs

After 250 epochs Not bad.

But wait a minute, if you think about it, the charts from 2007 to 2009 when there was a Lehman shock may be noisy. Here I came up with the idea of learning only charts from 2010 onwards. The number of data sets is 1875, but it's a challenge Play around with the model all day and try and error

After 500 epochs

The test set is now 64% accurate. This is what happened today. I would like to write an article after studying a little more about models and layers.

Conclusion

By letting Python learn candlestick charts, we found that it was difficult for computers to predict the future of the stock market. But on the other hand, 64% accuracy also suggests that there is something behind the chart that allows for prediction.

I would like to continue my research as long as the funds (living expenses) continue. Once I have a good AI, I want to embed it in a website so that anyone can use it. Last but not least, I wrote it in English, so please take a look. https://lulitech.nihoninanutshell.com

[PYTHON] Continue to make stock price forecast AI for 10 hours a day 1st month