[PYTHON] Let's visualize the trading volume of TSE stocks --jpxlab sample

In this article, we will use the processing / analysis tool "jpxlab" of TSE stock high frequency data (FLEX Historical) to visualize the TSE stock trading volume. I will try. See here for sample notebooks.

Metadata preparation

FLEX Historical also contains information that identifies the stock, but this sample uses a file published by the Tokyo Stock Exchange that provides more detailed metadata.

This file contains various information of the brand, and the brand code It is useful for mapping company names and for classifying stocks into categories by sector, TOPIX, etc.

Visualize trading volume by sector

To use the historical data in combination with the metadata provided by this file, sample jpxlab demo -sector volume.ipynb % 20-% 20sector% 20volume.ipynb) would be a good starting point.

This notebook leverages sector information to show how stocks are grouped. The figure below shows the integrated trading volume for each sector in September 2019.

Although it is a simple visualization, it can be confirmed at a glance that the number of trading companies and wholesalers in 17 industry categories increased on September 5, 2019, and the number of banks in 17 industry categories increased on September 10, 2019. I will. It is also possible to filter only the top 100 trading volumes in September with minor changes, as illustrated in the notebook.

Forecast daily turnover

One of the most interesting topics in market data analysis is probably "forecasting". Various approaches such as statistical analysis and machine learning can be considered, but it is a little difficult to build a complicated model of stock prices with only one month's data. Therefore, this time I will only introduce the methodology and start with a simple data analysis.

First, let's plot the autocorrelation of daily trading volume of Toyota (7203). The light blue area is the confidence interval, and it can be said that there is a significant autocorrelation if the value that penetrates it appears. Autocorrelation is a correlation with its own delayed copy, so if a high correlation coefficient is observed here, it can be said to be periodic. Unfortunately, no significant correlation was observed in the above example.

Next, let's select a stock with a high autocorrelation and see what the plot looks like. In the example, Subaru (7270) was selected from the top 100 stocks.

As shown in the plot, leg1 shows a relatively high positive correlation. The actual usefulness is neglected because this example is a very limited sample, but in cases where more samples are available, such data is somewhat predicted by an autoregressive model such as the ARIMA model. Is possible in many cases. Just in case, the plot of the original transaction volume is as follows.

Clustering

Let's do another analysis. By clustering according to the pattern of trading volume transition, we will try to observe the linked stocks and the transition of their trends. However, since the samples that can be used for learning are very limited every time, I hope that you will use it for future experiments by introducing the methodology. The notebook is jpxlab demo -cluster analysis.ipynb.

Dimensional compression by T-SNE

The target of clustering is a vector. This time, we want to analyze the similarity of trading volume transition patterns for each issue, so it is necessary to vectorize the time-series data of trading volume transition in some way.

However, this process is important, and the more dimensions the vector has, the more difficult it becomes to extract the essential features due to the curse of dimensionality, so we decided to use the technique of dimension compression to reduce it to the lowest possible dimension. Become.

The most well-known dimensional compression methods are PCA and MDS, but here we use an algorithm that is also effective for objects with a non-linear relationship called T-SNE. The input is a snapshot of the raw trading volume trends for each issue extracted in the sliding window. This is compressed to two dimensions for visualization.

For details, please refer to Notebook, but it seems to be meaningful during this period. Although it is difficult to visually grasp a large cluster, you can see that it is a very powerful tool that can do everything from clustering to visualizing its time-series changes with animation with simple code. I think not.

In this way, we hope that you can get a glimpse of the possibility that jpxlab can be used as an entrance for various applications such as design and verification of predictive models.