[PYTHON] Stock price forecast by machine learning Numerai Signals

Introduction

The previous article was here. This article is intended for those who have participated in the Numerai Tournament, and will be explained assuming that they have prior knowledge.

The Numerai Tournament explained in the previous article competed for predictive performance with a dataset prepared in advance by the management. Numerai Signals, on the other hand, is a more practical and expansive battlefield. Users must prepare their own datasets for forecasting. On the contrary, even the choice of universe is in their own hands. It's still in beta, but we believe participants will be able to enjoy the same tremendous benefits as Tournament in the future. Below, we will explain the specifications of Signals, incentives to participate, and points to keep in mind when building Signals.

Numerai Signals specifications

Signals overview

The Signals documentation is here [https://docs.numer.ai/numerai-signals/signals-overview). Signals provides forecasts of stock price rises and falls in markets around the world. As shown in the sample below, submit the forecast of ups and downs as a numerical value from 0 to 1 (Probability in general machine learning). Participants' ultimate goal is to have a data-driven hedge fund like Numerai "buy" the submitted Signal. It accesses various data sources around the world, finds features with plenty of alpha, and extracts Signals with high prediction performance and originality from them. And it replaces itself as part of the hedge fund brain. What an exciting attempt. Let's explain the Signals specifications below. 01.png

Target assets

Numerai Signals targets stocks in markets around the world, with a total of approximately 5,200 stocks at this time. The list changes daily, but most are deferred, with only illiquid stocks being replaced. The latest list is available here [https://numerai-quant-public-data.s3-us-west-2.amazonaws.com/example_predictions/latest.csv).

For reference, we have tabulated how many stocks are in which market. The largest market is the US market, with more than 2000 stocks. This is followed by the Japanese market, the Korean market, and the London market. 02.png

Participants do not have to submit forecasts for all of these stocks. You can create your own universe out of over 5,000 stocks (but you'll need at least 100 stocks). While this has a high degree of freedom, it also requires important judgment on how to select a universe.

About data acquisition

For these stocks, the participants themselves need to collect the data necessary for forecasting. Numerai Signals is a platform for users who already have their own forecasting system built and have access to market data. Quandl is the official data source used by the operation to evaluate the predictive performance of participants. Other data sources include Quantopian and Alpaca. Numerai's Forum shares a list of cheap data sources, so there ) Should be referred to. I'm currently using Yahoo Finance.

About Submission

Participants can submit the prediction result at any time. Participant predictions will retain the content of the final Submission until the next Submission. In other words, participants can rebalance their portfolio by submitting the forecast results. If it is a weekly forecast model, it can be submitted once a week. The time frame in the leaderboard evaluation described later is daily, and the closing price is the standard (forecast by Close to Close).

One thing to keep in mind is that the forecast results will be reflected at the closing time next to the final submission. In other words, in order to predict the next day, it is necessary to submit before the market closes. Forecast results are evaluated daily. Once submitted, the forecast results are valid until 60 days later, so the next submission (rebalancing) must be done within 60 days. 03.png

Leader board

Formal, but there is a leaderboard. The criteria for ranking is currently the Sharpe ratio of daily returns. 05.png

The Sharpe ratio is calculated as follows. Here, $ r_ {long} $ is the daily return of stocks included in the top half of the forecast results, and $ r_ {short} $ is the daily return of stocks included in the bottom half (when the number of stocks is odd). , The brand in the middle is included in Long). This formula may not be very common, but the calculation result is almost the same as $ ave (r_ {long})-ave (r_ {short}) $. Sharpe is a sharpe ratio converted from daily to yearly. Where $ r_ {daily} $ contains all the results after the user initiates the Submission. If you submit a high Variance prediction result on a trial basis, it will continue to be reflected throughout your life, so it may be better to start Submission after the model has been decided to some extent.

r_{daily}=\frac{1+ave(r_{long})}{1+ave(r_{short})}-1\\
Sharpe=\sqrt{365} \times \frac{ave(r_{daily})}{stdev(r_{daily})}

In addition, at least the data of the last 20 days is required to be on the leaderboard. However, since it is in beta now, it seems that it will appear on the leaderboard immediately after Submission (strictly speaking, about 3 business days after the results are aggregated).

Reward system

Reward (or collection) for stake

At the time of writing the article (6/23), the stake specifications have not been decided. Users must stake NMR to support their predictions (of course they can participate without staking, but they are not included in rewards or evaluations). This is a natural specification because some users submit experimental prediction results or aim for lucky punches. The stake is expected to be implemented in the near future. This chapter will be updated as soon as the stake specifications are decided.

Purchase of Singal

Numerai may purchase Signals separately from the stake rewards. However, this purchase standard and reward amount will not be disclosed. The only thing we know is that Numerai wants a Signal that is robust, long-term stable, and poorly correlated with other common factors. Its originality is more important than its predictive performance, and of course it is not purchased based on leaderboard ranking.

Incentive to participate in Signals

This chapter will be updated as soon as the stake specifications are decided, but it is considered that the following incentives exist in Signals from the findings in Tournament. These are surprisingly great benefits for system traders. The author, who was looking to manage individual US stocks, is also planning to maximize these benefits.

――If compensation is given and collected based on the participants' Sharpe, it may be more leveraged than the actual stock return and dramatically improve the market-neutral financial efficiency. ――At Numerai Signals, like Tournament, participants do not actually buy stocks, but can bet on the forecast itself. Therefore, the execution cost required for portfolio construction can be ignored, and the ideal return can be enjoyed. ――You can bet on a diversified portfolio that individual investors cannot realistically build, which will lead to stable investment performance. --Numerai Signals does not need to submit a predictive model like Tournament. Therefore, all the intellectual property of the participants is protected.

Points to keep in mind when building Signals

Universe selection

We consider the selection of universes to be the most important item in Signals. If the predictive power is the same, the wider the universe, the better the Sharpe ratio. Assuming that the predictive power is constant, the Sharpe ratio improves in proportion to the 1/2 power of the number of stocks in the universe. The figure below is an ideal simulation result. On the other hand, if too many stocks are taken in, there is a possibility that stocks with extremely high volatility may be included, or the characteristics that were visible in a small number of universes may not be visible and the overall prediction performance may deteriorate. Therefore, by limiting the universe to a narrower one, there is an option to improve the explanatory power within the universe while narrowing down to stocks with stable price movements. 06.png

I think that tuning the hyperparameters called universe is the key to distinguishing the superiority and inferiority of performance. In addition, in order to perform the tuning comprehensively, the ability to build an accurate database for as many stocks as possible is required.

Optimization of time frame (rebalancing frequency)

Basically, the shorter the rebalancing, the better the predictive power and the more stable the results. This is because there is a large time decay in the predictive power of financial data. The following is an example of the profit and loss curve during daily, weekly, and monthly rebalancing. 07.png

However, in the real world, execution costs are incurred, so it is often not possible to rebalance frequently. Numerai Signals does not currently incur a penalty for the frequency of submissions. However, if the forecast changes frequently, it is not preferable for a fund with a large investment amount, so there is a possibility that measures such as collecting a certain stake per submission may be taken. Taking into account the circumstances around this point, the user himself will optimize the forecast time frame.

in conclusion

In this article, I explained the specifications of Numerai Signals, incentives to participate, and points to keep in mind when building Signals. Numerai Signals has a high degree of freedom, and most of the strategy is left to the participants, who can challenge this by making full use of their own knowledge. And if there is a future in which hedge funds will rely on you at the end, it would be a great honor. The time has come for hedge funds to rely on individual investors to make investment decisions. You've been excited.

Now, let's look for a superb Signal that contains plenty of alpha.

Recommended Posts

Stock price forecast by machine learning Numerai Signals
Stock price forecast by machine learning is so true Numerai Signals
Stock price forecast by machine learning Let's get started Numerai
Stock price forecast using machine learning (regression)
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
Stock price forecast using deep learning (TensorFlow)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
Stock Price Forecast 2 Chapter 2
Reasonable price estimation of Mercari by machine learning
Stock price forecast using deep learning [Data acquisition]
Stock Price Forecast 1 Chapter 1
Is it possible to eat stock price forecasts by machine learning [Implementation plan]
Stock price forecast with tensorflow
Python: Stock Price Forecast Part 2
4 [/] Four Arithmetic by Machine Learning
Python: Stock Price Forecast Part 1
Machine learning summary by Python beginners
[Python] My stock price forecast [HFT]
Predicting stock price changes using metal labeling and two-step machine learning
Stock price acquisition code by scraping (Selenium)
Making Sandwichman's Tale by Machine Learning ver4
Is it possible to eat by forecasting stock prices by machine learning [Machine learning part 1]
[Failure] Find Maki Horikita by machine learning
Machine learning
Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Judgment of igneous rock by machine learning ②
Classification of guitar images by machine learning Part 1
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
Classify machine learning related information by topic model
Analysis of shared space usage by machine learning
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Classification of guitar images by machine learning Part 2
A story about data analysis by machine learning
[Memo] Machine learning
Machine learning classification
Machine Learning sample
Stock Price Forecast with TensorFlow (Multilayer Perceptron: MLP) ~ Stock Forecast Part 2 ~
Stock investment by deep reinforcement learning (policy gradient method) (1)
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2