[PYTHON] Data is missing when getting stock price data with Pandas-datareader

Introduction

Pandas-datareader is useful for getting stock price data, but unfortunately there may be data missing. For example, if you get "1357NF Nikkei Double Inverse" with Stooq,

import pandas_datareader.stooq as web
from datetime import datetime

start_date = datetime(2016,6,10)
end_date = datetime(2016,6,17)

dr = web.StooqDailyReader('1357.JP', start=start_date, end=end_date)
df = dr.read()
df.to_csv('1357.csv')

You will get a csv file like the one below.

Date,Open,High,Low,Close,Volume
2016-06-17,3330,3380,3290,3370,8019724
2016-06-16,3270,3465,3250,3450,10403857
2016-06-14,3220,3315,3185,3270,9910736
2016-06-13,3105,3205,3100,3200,8193928
2016-06-10,2981,3040,2977,3000,4247241

Data for 2016-06-15 is missing. Maybe I couldn't do it on this day due to system trouble or something? I thought, and when I checked the time series data of Yahoo! Finance.,

Y1357.png

At Yahoo! Finance., The data for that day existed.

When I get the nikkei225 index,

import pandas_datareader.stooq as web
from datetime import datetime

start_date = datetime(2016,6,10)
end_date = datetime(2016,6,17)

dr = web.StooqDailyReader('^NKX', start=start_date, end=end_date)
df = dr.read()
df.to_csv('NIKKEI225.csv')

There was no omission.

Date,Open,High,Low,Close,Volume
2016-06-17,15631.79,15774.87,15582.94,15599.66,1671723008
2016-06-16,15871.22,15913.08,15395.98,15434.14,1542472064
2016-06-15,15799.07,15997.3,15752.01,15919.58,1367727744
2016-06-14,16001.19,16082.5,15762.09,15859.0,1316932864
2016-06-13,16319.11,16335.38,16019.18,16019.18,1261788416
2016-06-10,16637.51,16643.36,16496.11,16601.36,1549976064

The lack seems to be due to the brand.

Measures against data loss

Depending on the brand, it may or may not be missing. If nothing is done, there is a risk of making serious mistakes when comparing stock prices, so it is necessary to remove missing rows or interpolate. You can remove or fill in missing rows by merging the two tables using pandas.

import pandas as pd

nikkei225 = pd.read_csv('NIKKEI225.csv').set_index('Date').sort_index()
n1357 = pd.read_csv('1357.csv').set_index('Date').sort_index()

merged = pd.DataFrame.merge(nikkei225, n1357, on='Date', how='inner')
merged2 = pd.DataFrame.merge(nikkei225, n1357, on='Date', how='outer')

merged gives the result with the missing rows removed.

merged.png

In merged2, missing lines are filled with NaN.

merged2.png

To interpolate the missing lines, fill them with NaN and then interpolate to the required values.

How to make up for the gap is explained in the following article. https://qiita.com/kazama0119/items/c838114f8687518ba58e I tried to predict the stock price by data analysis

Recommended Posts

Data is missing when getting stock price data with Pandas-datareader
Download Japanese stock price data with python
Get stock price data with Quandl API [Python]
Flow memo when getting json data with urllib
Automatic acquisition of stock price data with docker-compose
"Getting stock price time series data from k-db.com with Python" Program environment creation memo
Stock price forecast with tensorflow
Get stock price with Python
Stock price data acquisition tips
[Stock price analysis] Learning pandas with fictitious data (002: Log output)
[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)
[Stock price analysis] Learning pandas with fictitious data (003: Type organization-candlestick chart)
Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~
Let's do web scraping with Python (stock price)
Stock price forecast using deep learning [Data acquisition]
Materials to read when getting started with Python
[Python Data Frame] When the value is empty, fill it with the value of another column.