[PYTHON] A story about clustering time series data of foreign exchange

Summary of this article

Development environment

Data preparation

Using USD / JPY from 2018.01 to 2019.04, The entry point of the golden cross of the moving average on the 5-minute bar was used as sample data. (2482 data)

labeling

Labeling was done according to the following rules.

Result Label
Profit 1
Loss -1
Settlement by holding time 0

This time, we set the loss cut and profit taking lines so that they are roughly divided into three equal parts.

Clustering

Expected result

As shown in the graph below, I expected that "profit taking" / "loss cut" / "settlement by holding time" would be separated for each cluster.

2002_1_hoped.png

With this, in the case of cluster 2, it can be judged that it is not good and the trade can be forgotten.

result

We clustered using scikit-learn's TimeSeriesKMeans, illustrated the percentage of labels in each cluster, and sorted them in order of winning percentage.

2002_2_OHLC_and_RSI.png

Not good enough. .. The highest win rate was 45% and the lowest win rate was 22%. Since the original is almost divided into 3 equal parts (33%), it seems that it can be divided a little, but I would like it to be divided a little more beautifully.

Add upper leg

Aiming for improvement, we decided to add the following longer timeframe information to the features.

The result is below. 2002_3_with_long_term_indicator.png

The highest win rate was 63% and the lowest win rate was 14%. By adding the information of the upper legs, it has improved a lot. I think it was good because I was able to confirm again that the information on the upper legs is useful. With such a result, it seems difficult to avoid damaging, but I personally thought that it could be used to adjust the quantity of positions.

Thank you for reading the article.

reference

Recommended Posts

A story about clustering time series data of foreign exchange
About time series data and overfitting
Differentiation of time series data (discrete)
Time series analysis 3 Preprocessing of time series data
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
A story about data analysis by machine learning
A story about predicting exchange rates with Deep Learning
Anomaly detection of time series data by LSTM (Keras)
A story about struggling to loop 3 million ID data
A story about changing the master name of BlueZ
A story about improving the program for partial filling of 3D binarized image data
About data management of anvil-app-server
How to extract features of time series data with PySpark Basics
Comparison of time series data predictions between SARIMA and Prophet models
[numpy] Create a moving window matrix from multidimensional time series data
<Pandas> How to handle time series data in a pivot table
When plotting time series data and getting a matplotlib Overflow Error
Calculation of time series customer loyalty
A refreshing story about Python's Slice
Python: Time Series Analysis: Preprocessing Time Series Data
A sloppy story about Python's Slice
A story about using Python's reduce
The story of writing a program
[For beginners] Script within 10 lines (5. Resample of time series data using pandas)
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
A story about adopting Django instead of Rails at a young seed startup
A story about my new study of Python after 3 years of MATLAB experience
A story of a person who started aiming for data scientist from a beginner
Plot CSV of time series data with unixtime value in Python (matplotlib)
A note about the functions of the Linux standard library that handles time
[Kaggle] I tried feature engineering of multidimensional time series data using tsfresh.
Forecasting time series data with Simplex Projection
Predict time series data with neural network
The story of verifying the open data of COVID-19
A story about machine learning with Kyasuket
A memorandum of understanding about django's QueryDict
[Python] Accelerates loading of time series CSV
Time series analysis 4 Construction of SARIMA model
Time series data anomaly detection for beginners
Conversion of time data in 25 o'clock notation
The story of blackjack A processing (python)
How to handle time series data (implementation)
A story about a 503 error on Heroku open
Reading OpenFOAM time series data and sets data
A memorandum of trouble when formatting data
A story about achieving a horse racing recovery rate of over 100% through machine learning
Extract periods that match a particular pattern from pandas time series qualitative data