Introduction

Try to predict FX with LSTM using Keras + Tensorflow Part 2 (Calculate with GPU) I wrote that I will finally get off to a start I did. The reason is that there are many parameters used in deep learning and Forex, but I thought that it would take a considerable amount of time to find important or correct values among them.

That's right, you can't find it without using the GPU.

So this time I can use GPU, so I will try to find a good value by brute force various parameters that I have longed for.

Source

The source can be found at https://github.com/rakichiki/keras_fx. Or do a git clone.

git clone https://github.com/rakichiki/keras_fx.git

The source this time is keras_fx_gpu_multi.ipynb. Get this and upload it to jupyter.

Let me explain a little.

Parameter brute force

First, I decided on the parameters I wanted to change. It is as follows. (If you look closely, it's not all ...)

Days to look back
Number of days from buying and selling to judgment
Percentage of increase / decrease to determine buying / selling
Types of activation functions
Type of objective function
Types of optimization
Percentage of training data in train data
Currency pair

`Brute force`


l_of_s_list                  = [20,25]
n_next_list                  = [5,7]
check_treshhold_list         = [0.50,0.60]
#activation_list              = ['sigmoid','tanh','linear']
activation_list              = ['tanh']
#loss_func_list               = ['mean_squared_error','mean_absolute_error','mean_squared_logarithmic_error']
loss_func_list               = ['mean_squared_error','mean_absolute_error']
#optimizer_func_list          = ['sgd','adadelta','adam','adamax']
optimizer_func_list          = ['adadelta','adam','adamax']
#validation_split_number_list = [0.1,0.05]
validation_split_number_list = [0.05]

currency_pair_list   = ['usdjpy']

#Storage of result files
if os.path.exists('result') == False:
    os.mkdir('result')
if os.path.exists('png') == False:
    os.mkdir('png')

save_file_name = 'result/result_' + dt.today().strftime("%Y%m%d%H%M%S") + '.txt'
save_file_name = dt.today().strftime("%Y%m%d%H%M%S")

#fx data acquisition
start_day     = "20010101"
end_day       =  dt.today().strftime("%Y%m%d")

for currency_pair in currency_pair_list:
    (train_start_count, train_end_count,test_start_count, test_end_count,data) = \
        get_date(start_day, end_day, currency_pair)
    file_name = currency_pair + '_d.csv'

    for l_of_s in l_of_s_list:
        for n_next in n_next_list:
            for check_treshhold in check_treshhold_list:
                #
                (chane_data,average_value,diff_value, up_down,check_percent) = \
                    get_data(l_of_s, n_next,check_treshhold, file_name,train_start_count,\
                             train_end_count,test_start_count, test_end_count,data)
                
                #
                for activation in activation_list:
                    for loss_func in loss_func_list:
                        for optimizer_func in optimizer_func_list:
                            for validation_split_number in validation_split_number_list:
                                print('--------------------------')
                                fit_starttime = time.time()
                                fit(l_of_s, n_next,check_treshhold,file_name,save_file_name,activation,loss_func,optimizer_func,\
                                    validation_split_number,train_start_count, train_end_count,test_start_count, test_end_count,\
                                    chane_data,average_value,diff_value,up_down,check_percent)
                                print(str(math.floor(time.time() - fit_starttime)) + "s")
                                print('')

I would like to say that I would like to brute force these in the range of expectations, but since the time will increase exponentially, it is better to narrow down to some extent and investigate little by little. Well, even if you can use the GPU, if the speed is 10 times faster and the amount of calculation is 1000 times faster, it will be the original tree Ami. (But it is no longer within the range that can be calculated by the CPU)

Also, let's go little by little instead of turning around from the beginning due to the problem described later (I am the one who failed by turning a lot at the beginning).

Introduction of Early Stopping

The amount of calculation has exploded due to brute force parameters, but the improvement of GPU introduction alone is not enough. Therefore, we will introduce Early Stopping to prevent unnecessary looping by epochs.

`EarlyStopping`


early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=1)
～～
high_history = high_model.fit(X_high_train, y_high_train, batch_size=100, epochs=300, \
                   validation_split=validation_split_number, callbacks=[early_stopping])

I think Keras is easy around here. However, it is unclear if this is the condition for Early Stopping.

I want to see the learning curve

Of course, you can't tell if the parameters are correct without looking at the learning curve. Introducing it is not too difficult.

Just keep the return value of fit and graph it.

`Learning curve`



    #Learning
    high_history = high_model.fit(X_high_train, y_high_train, batch_size=100, epochs=300, \
                   validation_split=validation_split_number, callbacks=[early_stopping])

    ～～～～

    # high
    val_loss = high_history.history['val_loss']
    plt.rc('font',family='serif')
    fig = plt.figure()
    plt.plot(range(len(high_history.history['val_loss'])), val_loss, label='val_loss', color='black')
    plt.xlabel('epochs')
    plt.savefig('png/' + save_file_name + '_high_' + \
                str(l_of_s) + '_' + str(n_next) + \
                '_' + str(check_treshhold) + '_' + file_name + \
                '_' + activation + '_' + loss_func + \
                '_' + optimizer_func + '_' + str(validation_split_number) + \
                '.png')
    plt.show()

As a caveat, if you want to leave a graph, do plt.show () after plt.savefig. The reason is unknown, but if it is the other way around, it will not remain (I referred to the answer in the question corner somewhere).

When it is good, the graph with the transition of val_loss is displayed as shown below.

Well, it's another matter whether the hit rate is good just because this is beautiful. However, you can see whether learning is possible or not in this graph.

Save the result to a file

It is expected that it will take a very long time, but the PC may shut down on the way. I , I'm not the type who wants to let a PC without ECC memory work for over 10 hours and keep praying that it will not fall on the way.

So, save the analysis result to a file and take measures even if the PC goes down in the middle (although I will give up in case of storage failure).

`File output`


    f = open(save_file_name, 'a')
    f.write('l_of_s: ' + str(l_of_s) + ' n_next: ' + str(n_next) + \
            ' check_treshhold:' + str(check_treshhold) + ' file_name:' + file_name + \
            ' activation:' + activation + ' loss_func:' + loss_func + \
            ' optimizer_func:' + optimizer_func + ' validation_split_number:' + str(validation_split_number) + \
            '\n')
    f.write('UP: ' + str(up_ok_count) + ' - ' + str(up_ng_count) + ' - ' + str(up_ev_count) + '\n')
    f.write('DN: ' + str(down_ok_count) + ' - ' + str(down_ng_count) + ' - ' + str(down_ev_count) + '\n')
    f.close()

Was the csv format better? No, was it better to use JSON format (I like JSON format)? However, I will output the progress for the time being. Ah, the JSON format is useless considering that it will fail on the way.

You may want to save the graph as well, as mentioned above.

Result (not good ...)

I turned it a lot for the time being. However, for the reason described later, it is a little gentle (don't say that there is only one pattern such as activation function).

The only currency pair is usdjpy. The result is as follows (over the number of days of trading judgment is not included in the hit rate).

Days for trading judgment	Days after buying and selling	Change rate for trading judgment	Activation function	Objective function	Optimization algorithm	Percentage of training data(%)	Number of hits when going up	Number of misses when going up	Number of hits when it goes down	Number of deviations when lowered	Total hit rate(%)
20	5	0.5	tanh	mse	adadelta	0.05	55	34	114	81	59.5
20	5	0.5	tanh	mse	adam	0.05	24	22	66	46	57.0
20	5	0.5	tanh	mse	adamax	0.05	14	14	46	33	56.1
20	5	0.5	tanh	mae	adadelta	0.05	69	58	95	88	52.9
20	5	0.5	tanh	mae	adam	0.05	31	28	69	58	53.8
20	5	0.5	tanh	mae	adamax	0.05	29	26	84	69	54.3
20	5	0.6	tanh	mse	adadelta	0.05	72	53	129	98	57.1
20	5	0.6	tanh	mse	adam	0.05	64	52	111	97	54.0
20	5	0.6	tanh	mse	adamax	0.05	43	33	59	52	54.5
20	5	0.6	tanh	mae	adadelta	0.05	51	40	140	120	54.4
20	5	0.6	tanh	mae	adam	0.05	75	57	102	75	57.3
20	5	0.6	tanh	mae	adamax	0.05	45	39	107	93	53.5
20	7	0.5	tanh	mse	adadelta	0.05	11	12	84	81	50.5
20	7	0.5	tanh	mse	adam	0.05	7	5	45	35	56.5
20	7	0.5	tanh	mse	adamax	0.05	22	18	61	40	58.9
20	7	0.5	tanh	mae	adadelta	0.05	46	37	92	81	53.9
20	7	0.5	tanh	mae	adam	0.05	25	28	47	31	55.0
20	7	0.5	tanh	mae	adamax	0.05	20	28	75	62	51.4
20	7	0.6	tanh	mse	adadelta	0.05	23	16	39	39	53.0
20	7	0.6	tanh	mse	adam	0.05	24	21	77	67	53.4
20	7	0.6	tanh	mse	adamax	0.05	27	26	61	45	55.3
20	7	0.6	tanh	mae	adadelta	0.05	56	43	120	107	54.0
20	7	0.6	tanh	mae	adam	0.05	40	36	65	58	52.8
20	7	0.6	tanh	mae	adamax	0.05	49	41	60	54	53.4
25	5	0.5	tanh	mse	adadelta	0.05	54	32	86	60	60.3
25	5	0.5	tanh	mse	adam	0.05	25	21	59	41	57.5
25	5	0.5	tanh	mse	adamax	0.05	15	14	53	39	56.2
25	5	0.5	tanh	mae	adadelta	0.05	46	37	126	95	56.6
25	5	0.5	tanh	mae	adam	0.05	34	30	56	41	55.9
25	5	0.5	tanh	mae	adamax	0.05	25	24	69	47	57.0
25	5	0.6	tanh	mse	adadelta	0.05	23	21	108	94	53.3
25	5	0.6	tanh	mse	adam	0.05	19	20	58	51	52.0
25	5	0.6	tanh	mse	adamax	0.05	18	19	86	69	54.2
25	5	0.6	tanh	mae	adadelta	0.05	92	80	92	85	52.7
25	5	0.6	tanh	mae	adam	0.05	26	28	117	100	52.8
25	5	0.6	tanh	mae	adamax	0.05	32	31	126	102	54.3
25	7	0.5	tanh	mse	adadelta	0.05	32	18	110	95	55.7
25	7	0.5	tanh	mse	adam	0.05	16	16	37	19	60.2
25	7	0.5	tanh	mse	adamax	0.05	9	10	42	28	57.3
25	7	0.5	tanh	mae	adadelta	0.05	33	23	40	30	57.9
25	7	0.5	tanh	mae	adam	0.05	25	21	71	55	55.8
25	7	0.5	tanh	mae	adamax	0.05	36	29	55	38	57.6
25	7	0.6	tanh	mse	adadelta	0.05	43	35	104	92	53.6
25	7	0.6	tanh	mse	adam	0.05	23	23	63	58	51.5
25	7	0.6	tanh	mse	adamax	0.05	25	22	90	70	55.6
25	7	0.6	tanh	mae	adadelta	0.05	37	25	118	108	53.8
25	7	0.6	tanh	mae	adam	0.05	33	25	76	63	55.3
25	7	0.6	tanh	mae	adamax	0.05	40	25	74	59	57.6

The average was 55% at 60% at best and 50% at worst, which was a little better than the dice. By the way, it took about 2 hours to calculate 48 patterns (with Geforce GTX 1070). It is also expected that increasing the parameters will take time exponentially. For this reason, it will be necessary to take measures to speed up somewhere, and since the hit rate is poor, it is necessary to take measures, but before that, a big problem was found.

a problem occured

I was able to find the desired parameters to some extent, but I found a disappointing problem.

It's a memory-intensive problem. Initially I was looking for around 1000 patterns, but an event that became very slow occurred in the middle. After taking the countermeasures, the state of the PC as a result of doing less than 48 patterns is as follows.

The memory of the PC itself consumes 12GB, and the GPU consumes 2GB. Although it is not posted when one pattern is executed, the GPU consumed less than 1GB and the main unit consumed less than 4GB.

Well, if anything, it's a memory leak. Initially, this PC had only 8GB of memory, but in a hurry I replaced the memory (the chassis is Mini-ITX and there are only two memory slots) and increased it to 32GB (I think 16GB was good here). There are opinions, but half-finished investment does not give good results, so I made it 32GB at once).

I don't know why it's not consuming (or freeing) memory so much, but if you want to do more with this script you should take into account the amount of memory and the pattern and time to run it. .. I haven't come up with a workaround so far.

Finally

This series was actually planned so far. In the future, I will do my best to improve the result, which is better than the dice or within the margin of error, but there is no guarantee that the result will be obtained so far.

For this reason, I don't know how far I can go, but I think there are many things I can do. Here's what I'm assuming at this point:

Continuation of the journey to find the right parameters
Addition of conditions not to buy or sell (Example: Exclude when there is a difference between close on the day and open on the next day)
Introduction of a method to make a trading decision by executing multiple times (it is not the same every time because there is a random element)
Introduction of a method of comprehensively evaluating trading by implementing multiple parameters (because it seems that there are strengths and weaknesses depending on the parameters)
Increase the parameters to be changed (expected hidden number, etc.)
I expected it with one-dimensional data of only high and only low, but increase the dimension (addition of hihg-low or using Open / Close)
Introduction of high speed (GRU etc.)

It looks like we can do a lot like this, but to be clear, it's a level that 1 GPU is unlikely to be able to do. In that case, it may be necessary to install multiple GPUs or rent AWS. I plan to think about the next measures while thinking a little about this area.

[PYTHON] Try to predict FX with LSTM using Keras + Tensorflow Part 3 (Try brute force parameters)