Suddenly, I wanted to analyze stock price data, so I thought about how to do it. Since I don't have any prerequisite knowledge about stocks, I may not really have to do this. But I was able to get it, so I will publish it for the time being.
I tried to get the data through the following three steps.
I didn't want to use a strange site, so I decided to read the data from Yahoo Fainance.
pandas.read_html extracts "this is like a table" from html. So the return value is a list with a dataframe.
Yahoo Fainance is a page like the picture below, and obviously there is only one table, but since it extracts only the table-like ones, it also extracts places like "Panasonic Corporation 978.5 compared to the previous day ..." To do.
As a solution, I sifted with an if statement according to the table I wanted to get.
To get the data for the desired period, you have to specify the period with a url. Yahoo Fainance is limited to displaying 20 pieces of data at a time. If you want 20 or less data, you can get all the data just by embedding the given date in the url, but if you want more data, you can not get it well.
Therefore, I thought about creating a function that creates a date data list for a specified period so that it can be specified even during the period.
daterange.py
from datetime import datetime
from datetime import timedelta
def daterange(start, end):
# -----make list of "date"---------------------------
for n in range((end - start).days):
yield start + timedelta(n)
#----------------------------------------------------
start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
end = datetime.strptime(str(day_end), '%Y-%m-%d').date()
list_days = [i for i in daterange(start, end)]
Here, start and end are called tatetime functions, which are date-only functions. It cannot be added or subtracted.
datetime.html(n)
datetime.delta is used to add and subtract dates for the datetime function. Since the first argument is days, here, the number of days from start to end is added one day at a time and yielded.
Some people may not be familiar with the function called yield, so to explain, yield is a function that can return a return value in small pieces. There is not much merit here, but a function that returns a large list at once consumes a lot of memory at once, so yield contributes to reducing memory consumption in such cases. Will give you.
It was okay to display it normally, but I also tried to display the graph with fft (Fast Fourier Transform) to kill time.
By the way, Panasonic, Sony, and Japanese average brand codes are also listed, so please use them.
get_stock.py
import pandas
import numpy as np
import matplotlib.pyplot as plt
import csv
from datetime import datetime
from datetime import timedelta
def get_stock_df(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
# -----colmn name------------------------------------
#Date, open price, high price, low price, close price, volume, adjusted close price
#Date, Open, High, Low, Close, Volume, Adjusted Close
#----------------------------------------------------
print("<<< Start stock data acquisition >>>")
print("code:{0}, from {1} to {2}".format(code, day_start, day_end))
def get_stock_df_under20(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
# -----source: Yahoo Finance----------------------------
# Up to 20 data can be displayed on the site at one time
#-------------------------------------------------------
sy,sm,sd = str(day_start).split('-')
ey,em,ed = str(day_end).split('-')
url="https://info.finance.yahoo.co.jp/history/?code={0}&sy={1}&sm={2}&sd={3}&ey={4}&em={5}&ed={6}&tm=d".format(code,sy,sm,sd,ey,em,ed)
list_df = pandas.read_html(url,header=0)
for i in range(len(list_df)):
if list_df[i].columns[0] == "date":
df = list_df[i]
return df.iloc[::-1]
#-------------------------------------------------------
def daterange(start, end):
# -----make list of "date"---------------------------
for n in range((end - start).days):
yield start + timedelta(n)
#----------------------------------------------------
start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
end = datetime.strptime(str(day_end), '%Y-%m-%d').date()
list_days = [i for i in daterange(start, end)]
pages = len(list_days) // 20
mod = len(list_days) % 20
if mod == 0:
pages = pages -1
mod = 20
start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
end = datetime.strptime(str(day_end), '%Y-%m-%d').date()
df_main = get_stock_df_under20(code, list_days[0], list_days[mod-1])
for p in range(pages):
df = get_stock_df_under20(code, list_days[20*p + mod], list_days[20*(p+1) + mod-1])
df_main = pandas.concat([df_main, df])
print("<<< Completed >>> ({0}days)".format(len(df_main)))
return df_main
def graphing(f, dt):
#Data parameters
N = len(f) #The number of samples
dt = 1 #Sampling interval
t = np.arange(0, N*dt, dt) #Time axis
freq = np.linspace(0, 1.0/dt, N) #Frequency axis
#Fast Fourier transform
F = np.fft.fft(f)
#Calculate amplitude spectrum
Amp = np.abs(F)
#graph display
plt.figure(figsize=(14,6))
plt.rcParams['font.family'] = 'Times New Roman'
plt.rcParams['font.size'] = 17
plt.subplot(121)
plt.plot(t, f, label='f(n)')
plt.xlabel("Time", fontsize=20)
plt.ylabel("Signal", fontsize=20)
plt.grid()
leg = plt.legend(loc=1, fontsize=25)
leg.get_frame().set_alpha(1)
plt.subplot(122)
plt.plot(freq, Amp, label='|F(k)|')
plt.xlabel('Frequency', fontsize=20)
plt.ylabel('Amplitude', fontsize=20)
plt.ylim(0,1000)
plt.grid()
leg = plt.legend(loc=1, fontsize=25)
leg.get_frame().set_alpha(1)
plt.show()
nikkei = "998407.O"
panasonic = "6752.T"
sony = "6758.T"
# code info "https://info.finance.yahoo.co.jp/history/?code="
df = get_stock_df(panasonic,"2016-5-23","2019-5-23")
#graphing(df["closing price"],1)
csv = df.to_csv('./panasonic.csv', encoding='utf_8_sig')
Recommended Posts