I want to get daily stock price data with python

Suddenly, I wanted to analyze stock price data, so I thought about how to do it. Since I don't have any prerequisite knowledge about stocks, I may not really have to do this. But I was able to get it, so I will publish it for the time being.

Method

I tried to get the data through the following three steps.

1. 1. Get daily data with pandas.read_html

I didn't want to use a strange site, so I decided to read the data from Yahoo Fainance.

pandas.read_html extracts "this is like a table" from html. So the return value is a list with a dataframe.

Yahoo Fainance is a page like the picture below, and obviously there is only one table, but since it extracts only the table-like ones, it also extracts places like "Panasonic Corporation 978.5 compared to the previous day ..." To do.

As a solution, I sifted with an if statement according to the table I wanted to get.

2. Get consecutive dates with the datetime function

To get the data for the desired period, you have to specify the period with a url. Yahoo Fainance is limited to displaying 20 pieces of data at a time. If you want 20 or less data, you can get all the data just by embedding the given date in the url, but if you want more data, you can not get it well.

Therefore, I thought about creating a function that creates a date data list for a specified period so that it can be specified even during the period.

`daterange.py`


from datetime import datetime
from datetime import timedelta

def daterange(start, end):
	# -----make list of "date"---------------------------
	for n in range((end - start).days):
		yield start + timedelta(n)
	#----------------------------------------------------
	
start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()

list_days = [i for i in daterange(start, end)]

Here, start and end are called tatetime functions, which are date-only functions. It cannot be added or subtracted.

datetime.html(n)

datetime.delta is used to add and subtract dates for the datetime function. Since the first argument is days, here, the number of days from start to end is added one day at a time and yielded.

Some people may not be familiar with the function called yield, so to explain, yield is a function that can return a return value in small pieces. There is not much merit here, but a function that returns a large list at once consumes a lot of memory at once, so yield contributes to reducing memory consumption in such cases. Will give you.

Graph visualization

It was okay to display it normally, but I also tried to display the graph with fft (Fast Fourier Transform) to kill time.

Finally the code

By the way, Panasonic, Sony, and Japanese average brand codes are also listed, so please use them.

`get_stock.py`



import pandas
import numpy as np
import matplotlib.pyplot as plt
import csv

from datetime import datetime
from datetime import timedelta





def get_stock_df(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
	# -----colmn name------------------------------------
	#Date, open price, high price, low price, close price, volume, adjusted close price
	#Date, Open, High, Low, Close, Volume, Adjusted Close
	#----------------------------------------------------
	print("<<< Start stock data acquisition >>>")
	print("code:{0}, from {1} to {2}".format(code, day_start, day_end))

	def get_stock_df_under20(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
		# -----source: Yahoo Finance----------------------------
		# Up to 20 data can be displayed on the site at one time
		#-------------------------------------------------------

		sy,sm,sd = str(day_start).split('-')
		ey,em,ed = str(day_end).split('-')

		url="https://info.finance.yahoo.co.jp/history/?code={0}&sy={1}&sm={2}&sd={3}&ey={4}&em={5}&ed={6}&tm=d".format(code,sy,sm,sd,ey,em,ed)

		list_df = pandas.read_html(url,header=0)
		
		for i in range(len(list_df)):
			if list_df[i].columns[0] == "date":
				df = list_df[i]
		
		return df.iloc[::-1]
		#-------------------------------------------------------


	def daterange(start, end):
		# -----make list of "date"---------------------------
		for n in range((end - start).days):
			yield start + timedelta(n)
		#----------------------------------------------------

	
	start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
	end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()
	
	list_days = [i for i in daterange(start, end)]
	
	pages = len(list_days) // 20
	mod = len(list_days) % 20

	if mod == 0:
		pages = pages -1
		mod = 20
	
	start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
	end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()

	df_main = get_stock_df_under20(code, list_days[0], list_days[mod-1])

	for p in range(pages):
		df = get_stock_df_under20(code, list_days[20*p + mod], list_days[20*(p+1) + mod-1])
		df_main = pandas.concat([df_main, df])

	print("<<< Completed >>> ({0}days)".format(len(df_main)))
	return df_main





def graphing(f, dt):

	#Data parameters
	N = len(f)           #The number of samples
	dt = 1        #Sampling interval
	t = np.arange(0, N*dt, dt) #Time axis
	freq = np.linspace(0, 1.0/dt, N) #Frequency axis

	#Fast Fourier transform
	F = np.fft.fft(f)

	#Calculate amplitude spectrum
	Amp = np.abs(F)

	#graph display
	plt.figure(figsize=(14,6))
	plt.rcParams['font.family'] = 'Times New Roman'
	plt.rcParams['font.size'] = 17
	plt.subplot(121)
	plt.plot(t, f, label='f(n)')
	plt.xlabel("Time", fontsize=20)
	plt.ylabel("Signal", fontsize=20)
	plt.grid()
	leg = plt.legend(loc=1, fontsize=25)
	leg.get_frame().set_alpha(1)
	plt.subplot(122)
	plt.plot(freq, Amp, label='|F(k)|')
	plt.xlabel('Frequency', fontsize=20)
	plt.ylabel('Amplitude', fontsize=20)
	plt.ylim(0,1000)
	plt.grid()
	leg = plt.legend(loc=1, fontsize=25)
	leg.get_frame().set_alpha(1)
	plt.show()





nikkei = "998407.O"
panasonic = "6752.T"
sony = "6758.T"

# code info "https://info.finance.yahoo.co.jp/history/?code="





df = get_stock_df(panasonic,"2016-5-23","2019-5-23")
#graphing(df["closing price"],1)
csv = df.to_csv('./panasonic.csv', encoding='utf_8_sig')

[PYTHON] Acquisition of time series data (daily) of stock prices