Regarding data interpolation, we have dealt with ordinary time series data before. This time, by interpolating the daily data for the purpose of backtesting, we created 6 hours, 4 hours, 1 hour, and 10 minutes data and checked the accuracy.
・ Lagrange interpolation formula ・ Interpolation of Pandas data ・ 6 hours to 10 minutes data creation ・ See accuracy
【reference】 ・ [Interpolation] Interpolate from linear interpolation to quadratic interpolation and Lagrange interpolation ♬ First, the linear interpolation function passing through (x0, y0), (x1, y1) can be calculated by the following function.
def interpolation(x0,y0,x1,y1,x):
dn = (x0-x1)
return y0*(x-x1)/dn + y1*(x0-x)/dn
#Lagrange interpolation
# y0*(x-x1)/(x0-x1)+y1*(x-x0)/(x1-x0)
The quadratic interpolation function that passes through the three points (x0, y0), (x1, y1), (x2, y2) can be calculated as follows.
def interpolation2(x0,y0,x1,y1,x2,y2,x):
dn1 = (x0-x1)*(x0-x2)
dn2 = (x1-x2)*(x1-x0)
dn3 = (x2-x0)*(x2-x1)
return y0*(x-x1)*(x-x2)/dn1+y1*(x-x2)*(x-x0)/dn2+y2*(x-x0)*(x-x1)/dn3
Combining these, if the end point processing is a function that passes through the above two points and the other functions that pass through three points are used, the function that interpolates 10 points between the two points is as follows.
m=10
sigxm=np.zeros(m*pitch-(m-1))
sigxm[0]=y[0]
sigxm[m*pitch-m]=y[pitch-1]
for i in range(1,m*pitch-m,1):
if i%m==0:
sigxm[i]=y[int(i/m)]
if i > m*pitch-(2*m+1):
sigxm[i] = interpolation(int(i/m),y[int(i/m)],int(i/m)+1,y[int(i/m)+1],int(i/m)+(i%m)/m)
else:
sigxm[i] = interpolation2(int(i/m),y[int(i/m)],int(i/m)+1,y[int(i/m)+1],int(i/m)+2,y[int(i/m)+2],int(i/m)+(i%m)/m)
Last time, I said that n-th order interpolation is also possible, but in actual interpolation, a smooth function is obtained with the above function that combines quadratic interpolation, so I will adopt the above function this time as well. This time, by replacing the variable y of this function with Pandas data, we will create interpolation data for exchange rate and stock data.
From the conclusion, the linear interpolation and secondary interpolation functions do not change. Then, the calculation of the m division point can be calculated by the following function. First, read the exchange data with pandas.DataReader.
df=DataReader.get_data_yahoo("{}".format("JPY=X"),start,end)
Then, if you replace y with df ["Close"] and calculate sequentially as shown below, you can also calculate Pandas data.
m=24
sigxm=np.zeros(m*pitch-(m-1))
sigxm[0]=df["Close"][0]
sigxm[m*pitch-m]=df["Close"][pitch-1]
for i in range(1,m*pitch-m,1):
if i%m==0:
sigxm[i]=df["Close"][int(i/m)]
if i > m*pitch-(2*m+1):
sigxm[i] = interpolation(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+(i%m)/m)
else:
sigxm[i] = interpolation2(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+2,df["Close"][int(i/m)+2],int(i/m)+(i%m)/m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)
Then, the dfsigxm calculated by changing m by the above function is plotted by the following plot_fig (data_df, dfsig, m, end) function.
def plot_fig(data_df,dfsig,m,end):
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 8, 4*2),dpi=100)
ax1.plot(date_df[0:],dfsig["sig"][0:])
ax2.plot(date_df[0:32*m],dfsig["sig"][0:32*m])
ax1.grid()
ax2.grid()
plt.pause(1)
plt.savefig("./fx/{}_{}_{}_{}_.png ".format(m,"interpolate","JPN=X",end))
plt.close()
You can calculate 6 hours of data with m = 4, 4 hours with m = 6, 1 hour with m = 24, and 10 minutes with m = 24 * 6. The results are as follows. Since m = 1 is a normal graph, the end points remain, but other than that, there is almost no change including the enlarged view below, but the plot density has increased, and if you look closely, the numbers on the horizontal axis have increased. I understand.
The easy thing is to compare it with the actual 10-minute bar, but it seems that there is no site that can get it.
・ 6 hours, 4 hours, 1 hour, and 10 minutes data were created by acquiring daily exchange data with pandas and interpolating it. ・ Smooth data string was created
・ Unfortunately, it was not possible to compare and verify with the actual data.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader.data as DataReader
import datetime as dt
def plot_fig(data_df,dfsig,m,end):
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 8, 4*2),dpi=100)
ax1.plot(date_df[0:],dfsig["sig"][0:],"o-")
ax2.plot(date_df[0:32*m],dfsig["sig"][0:32*m],"o-")
ax1.grid()
ax2.grid()
plt.pause(1)
plt.savefig("./fx/{}_{}_{}_{}_.png ".format(m,"interpolate","JPN=X",end))
plt.close()
def interpolation(x0,y0,x1,y1,x):
return y0 + (y1 - y0) * (x - x0) / (x1 - x0)
def interpolation2(x0,y0,x1,y1,x2,y2,x):
dn1 = (x0-x1)*(x0-x2)
dn2 = (x1-x2)*(x1-x0)
dn3 = (x2-x0)*(x2-x1)
return y0*(x-x1)*(x-x2)/dn1+y1*(x-x2)*(x-x0)/dn2+y2*(x-x0)*(x-x1)/dn3
def calc_interpolate(df,pitch,m):
sigxm=np.zeros(m*pitch-(m-1))
sigxm[0]=df["Close"][0]
sigxm[m*pitch-m]=df["Close"][pitch-1]
for i in range(1,m*pitch-m,1):
if i%m==0:
sigxm[i]=df["Close"][int(i/m)]
if i > m*pitch-(2*m+1):
sigxm[i] = interpolation(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+(i%m)/m)
else:
sigxm[i] = interpolation2(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+2,df["Close"][int(i/m)+2],int(i/m)+(i%m)/m)
return sigxm
start = dt.date(2020,1,1)
end = dt.date(2020,6,15)
df=DataReader.get_data_yahoo("{}".format("JPY=X"),start,end)
m=1
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = df["Close"]
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)
m=4
pitch = len(df)
sigx2=calc_interpolate(df,pitch,m)
dfsigx2 = pd.DataFrame()
dfsigx2["sig"] = sigx2
print(dfsigx2)
date_df=dfsigx2['sig'].index.tolist()
plot_fig(date_df,dfsigx2,m,end)
m=6
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)
m=24
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)
m=24*6
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)
Recommended Posts