[PYTHON] [Introduction to Pandas] I tried to increase exchange data by data interpolation ♬

Regarding data interpolation, we have dealt with ordinary time series data before. This time, by interpolating the daily data for the purpose of backtesting, we created 6 hours, 4 hours, 1 hour, and 10 minutes data and checked the accuracy.

What i did

・ Lagrange interpolation formula ・ Interpolation of Pandas data ・ 6 hours to 10 minutes data creation ・ See accuracy

・ Lagrange interpolation formula

【reference】 ・ [Interpolation] Interpolate from linear interpolation to quadratic interpolation and Lagrange interpolation ♬ First, the linear interpolation function passing through (x0, y0), (x1, y1) can be calculated by the following function.

def interpolation(x0,y0,x1,y1,x):
    dn = (x0-x1)
    return y0*(x-x1)/dn + y1*(x0-x)/dn
#Lagrange interpolation
# y0*(x-x1)/(x0-x1)+y1*(x-x0)/(x1-x0)

The quadratic interpolation function that passes through the three points (x0, y0), (x1, y1), (x2, y2) can be calculated as follows.

def interpolation2(x0,y0,x1,y1,x2,y2,x):
    dn1 = (x0-x1)*(x0-x2)
    dn2 = (x1-x2)*(x1-x0)
    dn3 = (x2-x0)*(x2-x1)
    return y0*(x-x1)*(x-x2)/dn1+y1*(x-x2)*(x-x0)/dn2+y2*(x-x0)*(x-x1)/dn3

Combining these, if the end point processing is a function that passes through the above two points and the other functions that pass through three points are used, the function that interpolates 10 points between the two points is as follows.

m=10
sigxm=np.zeros(m*pitch-(m-1))
sigxm[0]=y[0]
sigxm[m*pitch-m]=y[pitch-1]
for i in range(1,m*pitch-m,1):
    if i%m==0:
        sigxm[i]=y[int(i/m)]
    if i > m*pitch-(2*m+1):
        sigxm[i] = interpolation(int(i/m),y[int(i/m)],int(i/m)+1,y[int(i/m)+1],int(i/m)+(i%m)/m)
    else:
        sigxm[i] = interpolation2(int(i/m),y[int(i/m)],int(i/m)+1,y[int(i/m)+1],int(i/m)+2,y[int(i/m)+2],int(i/m)+(i%m)/m)

Last time, I said that n-th order interpolation is also possible, but in actual interpolation, a smooth function is obtained with the above function that combines quadratic interpolation, so I will adopt the above function this time as well. This time, by replacing the variable y of this function with Pandas data, we will create interpolation data for exchange rate and stock data.

・ Interpolation of Pandas data

From the conclusion, the linear interpolation and secondary interpolation functions do not change. Then, the calculation of the m division point can be calculated by the following function. First, read the exchange data with pandas.DataReader.

df=DataReader.get_data_yahoo("{}".format("JPY=X"),start,end)

Then, if you replace y with df ["Close"] and calculate sequentially as shown below, you can also calculate Pandas data.

m=24
sigxm=np.zeros(m*pitch-(m-1))
sigxm[0]=df["Close"][0]
sigxm[m*pitch-m]=df["Close"][pitch-1]
for i in range(1,m*pitch-m,1):
    if i%m==0:
        sigxm[i]=df["Close"][int(i/m)]
    if i > m*pitch-(2*m+1):
        sigxm[i] = interpolation(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+(i%m)/m)
    else:
        sigxm[i] = interpolation2(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+2,df["Close"][int(i/m)+2],int(i/m)+(i%m)/m)

dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)

Then, the dfsigxm calculated by changing m by the above function is plotted by the following plot_fig (data_df, dfsig, m, end) function.

def plot_fig(data_df,dfsig,m,end):
    fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 8, 4*2),dpi=100)
    ax1.plot(date_df[0:],dfsig["sig"][0:])
    ax2.plot(date_df[0:32*m],dfsig["sig"][0:32*m])
    ax1.grid()
    ax2.grid()
    plt.pause(1)
    plt.savefig("./fx/{}_{}_{}_{}_.png ".format(m,"interpolate","JPN=X",end))
    plt.close()

・ 6 hours to 10 minutes data creation

You can calculate 6 hours of data with m = 4, 4 hours with m = 6, 1 hour with m = 24, and 10 minutes with m = 24 * 6. The results are as follows. Since m = 1 is a normal graph, the end points remain, but other than that, there is almost no change including the enlarged view below, but the plot density has increased, and if you look closely, the numbers on the horizontal axis have increased. I understand. hokan_USDJPY.gif

・ See accuracy

The easy thing is to compare it with the actual 10-minute bar, but it seems that there is no site that can get it.

Summary

・ 6 hours, 4 hours, 1 hour, and 10 minutes data were created by acquiring daily exchange data with pandas and interpolating it. ・ Smooth data string was created

・ Unfortunately, it was not possible to compare and verify with the actual data.

Whole code

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader.data as DataReader
import datetime as dt

def plot_fig(data_df,dfsig,m,end):
    fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 8, 4*2),dpi=100)
    ax1.plot(date_df[0:],dfsig["sig"][0:],"o-")
    ax2.plot(date_df[0:32*m],dfsig["sig"][0:32*m],"o-")
    ax1.grid()
    ax2.grid()
    plt.pause(1)
    plt.savefig("./fx/{}_{}_{}_{}_.png ".format(m,"interpolate","JPN=X",end))
    plt.close()

def interpolation(x0,y0,x1,y1,x):
    return y0 + (y1 - y0) * (x - x0) / (x1 - x0)

def interpolation2(x0,y0,x1,y1,x2,y2,x):
    dn1 = (x0-x1)*(x0-x2)
    dn2 = (x1-x2)*(x1-x0)
    dn3 = (x2-x0)*(x2-x1)
    return y0*(x-x1)*(x-x2)/dn1+y1*(x-x2)*(x-x0)/dn2+y2*(x-x0)*(x-x1)/dn3

def calc_interpolate(df,pitch,m):
    sigxm=np.zeros(m*pitch-(m-1))
    sigxm[0]=df["Close"][0]
    sigxm[m*pitch-m]=df["Close"][pitch-1]
    for i in range(1,m*pitch-m,1):
        if i%m==0:
            sigxm[i]=df["Close"][int(i/m)]
        if i > m*pitch-(2*m+1):
            sigxm[i] = interpolation(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+(i%m)/m)
        else:
            sigxm[i] = interpolation2(int(i/m),df["Close"][int(i/m)],int(i/m)+1,df["Close"][int(i/m)+1],int(i/m)+2,df["Close"][int(i/m)+2],int(i/m)+(i%m)/m)
    return sigxm        

start = dt.date(2020,1,1)
end = dt.date(2020,6,15)
df=DataReader.get_data_yahoo("{}".format("JPY=X"),start,end)

m=1
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = df["Close"]
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)

m=4
pitch = len(df)
sigx2=calc_interpolate(df,pitch,m)
dfsigx2 = pd.DataFrame()
dfsigx2["sig"] = sigx2
print(dfsigx2)
date_df=dfsigx2['sig'].index.tolist()
plot_fig(date_df,dfsigx2,m,end)

m=6
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)

m=24
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)

m=24*6
sigxm=calc_interpolate(df,pitch,m)
dfsigxm = pd.DataFrame()
dfsigxm["sig"] = sigxm
print(dfsigxm)
date_df=dfsigxm['sig'].index.tolist()
plot_fig(date_df,dfsigxm,m,end)

Recommended Posts

[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬
[Introduction to simulation] I tried playing by simulating corona infection ♬
[Introduction to simulation] I tried playing by simulating corona infection ♬ Part 2
[Pandas] I tried to analyze sales data with Python [For beginners]
I tried fMRI data analysis with python (Introduction to brain information decoding)
I tried to program bubble sort by language
I tried to get an image by scraping
I tried to save the data with discord
I tried to get CloudWatch data with Python
I tried to classify dragon ball by adaline
[Introduction to PID] I tried to control and play ♬
I tried to rescue the data of the laptop by booting it on Ubuntu
[Introduction to Docker] I tried to summarize various Docker knowledge obtained by studying (Windows / Python)
I tried to debug.
I tried to paste
I want to give a group_id to a pandas data frame
I tried to predict the J-League match (data analysis)
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to summarize how to use pandas in python
I tried to analyze J League data with Python
[Introduction to AWS] I tried playing with voice-text conversion ♪
[Introduction] I tried to implement it by myself while explaining the binary search tree.
[First data science ⑤] I tried to help my friend find the first property by data analysis.
I tried to predict horse racing by doing everything from data collection to deep learning
[Introduction] I tried to implement it by myself while explaining to understand the binary tree
I tried to aggregate & compare unit price data by language with Real Gachi by Python
I tried scraping food recall information with Python to create a pandas data frame
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]
A super introduction to Django by Python beginners! Part 6 I tried to implement the login function
I tried to search videos using Youtube Data API (beginner)
I tried to learn PredNet
I tried to make various "dummy data" with Python faker
I tried to implement anomaly detection by sparse structure learning
[I tried using Pythonista 3] Introduction
I tried to organize SVM.
I tried to speed up video creation by parallel processing
I tried to implement PCANet
Introduction to Nonlinear Optimization (I)
[Django] I tried to implement access control by class inheritance.
I tried to summarize the code often used in Pandas
I tried to classify MNIST by GNN (with PyTorch geometric)
[Introduction to infectious disease model] I tried fitting and playing ♬
I tried to reintroduce Linux
I tried to introduce Pylint
I tried Pandas' Sql Upsert
Mongodb Shortest Introduction (3) I tried to speed up even millions
I tried to summarize SparseMatrix
I tried to analyze scRNA-seq data using Topological Data Analysis (TDA)
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to get data from AS / 400 quickly using pypyodbc
I tried to create a simple credit score by logistic regression.
I tried to visualize the Beverage Preference Dataset by tensor decomposition.
I tried to implement sentence classification by Self Attention with PyTorch
I tried to visualize BigQuery data using Jupyter Lab on GCP
I tried to summarize the commands used by beginner engineers today
I tried to predict by letting RNN learn the sine wave
I tried to visualize Boeing of violin performance by pose estimation
I tried to solve the shift scheduling problem by various methods
[Python] I tried to get various information using YouTube Data API!
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1