[PYTHON] Acquisition of time series data (daily) of stock prices

I want to get daily stock price data with python

Suddenly, I wanted to analyze stock price data, so I thought about how to do it. Since I don't have any prerequisite knowledge about stocks, I may not really have to do this. But I was able to get it, so I will publish it for the time being.

Method

I tried to get the data through the following three steps.

1. 1. Get daily data with pandas.read_html

I didn't want to use a strange site, so I decided to read the data from Yahoo Fainance.

pandas.read_html extracts "this is like a table" from html. So the return value is a list with a dataframe.

Yahoo Fainance is a page like the picture below, and obviously there is only one table, but since it extracts only the table-like ones, it also extracts places like "Panasonic Corporation 978.5 compared to the previous day ..." To do.

image.png

As a solution, I sifted with an if statement according to the table I wanted to get.

2. Get consecutive dates with the datetime function

To get the data for the desired period, you have to specify the period with a url. Yahoo Fainance is limited to displaying 20 pieces of data at a time. If you want 20 or less data, you can get all the data just by embedding the given date in the url, but if you want more data, you can not get it well.

Therefore, I thought about creating a function that creates a date data list for a specified period so that it can be specified even during the period.

daterange.py


from datetime import datetime
from datetime import timedelta

def daterange(start, end):
	# -----make list of "date"---------------------------
	for n in range((end - start).days):
		yield start + timedelta(n)
	#----------------------------------------------------
	
start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()

list_days = [i for i in daterange(start, end)]


Here, start and end are called tatetime functions, which are date-only functions. It cannot be added or subtracted.

datetime.html(n)

datetime.delta is used to add and subtract dates for the datetime function. Since the first argument is days, here, the number of days from start to end is added one day at a time and yielded.

Some people may not be familiar with the function called yield, so to explain, yield is a function that can return a return value in small pieces. There is not much merit here, but a function that returns a large list at once consumes a lot of memory at once, so yield contributes to reducing memory consumption in such cases. Will give you.

Graph visualization

It was okay to display it normally, but I also tried to display the graph with fft (Fast Fourier Transform) to kill time.

Figure_1.png

Finally the code

By the way, Panasonic, Sony, and Japanese average brand codes are also listed, so please use them.

get_stock.py



import pandas
import numpy as np
import matplotlib.pyplot as plt
import csv

from datetime import datetime
from datetime import timedelta





def get_stock_df(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
	# -----colmn name------------------------------------
	#Date, open price, high price, low price, close price, volume, adjusted close price
	#Date, Open, High, Low, Close, Volume, Adjusted Close
	#----------------------------------------------------
	print("<<< Start stock data acquisition >>>")
	print("code:{0}, from {1} to {2}".format(code, day_start, day_end))

	def get_stock_df_under20(code, day_start, day_end):# (Stock Code, start y-m-d, end y-m-d)
		# -----source: Yahoo Finance----------------------------
		# Up to 20 data can be displayed on the site at one time
		#-------------------------------------------------------

		sy,sm,sd = str(day_start).split('-')
		ey,em,ed = str(day_end).split('-')

		url="https://info.finance.yahoo.co.jp/history/?code={0}&sy={1}&sm={2}&sd={3}&ey={4}&em={5}&ed={6}&tm=d".format(code,sy,sm,sd,ey,em,ed)

		list_df = pandas.read_html(url,header=0)
		
		for i in range(len(list_df)):
			if list_df[i].columns[0] == "date":
				df = list_df[i]
		
		return df.iloc[::-1]
		#-------------------------------------------------------


	def daterange(start, end):
		# -----make list of "date"---------------------------
		for n in range((end - start).days):
			yield start + timedelta(n)
		#----------------------------------------------------

	
	start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
	end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()
	
	list_days = [i for i in daterange(start, end)]
	
	pages = len(list_days) // 20
	mod = len(list_days) % 20

	if mod == 0:
		pages = pages -1
		mod = 20
	
	start = datetime.strptime(str(day_start), '%Y-%m-%d').date()
	end   = datetime.strptime(str(day_end), '%Y-%m-%d').date()

	df_main = get_stock_df_under20(code, list_days[0], list_days[mod-1])

	for p in range(pages):
		df = get_stock_df_under20(code, list_days[20*p + mod], list_days[20*(p+1) + mod-1])
		df_main = pandas.concat([df_main, df])

	print("<<< Completed >>> ({0}days)".format(len(df_main)))
	return df_main





def graphing(f, dt):

	#Data parameters
	N = len(f)           #The number of samples
	dt = 1        #Sampling interval
	t = np.arange(0, N*dt, dt) #Time axis
	freq = np.linspace(0, 1.0/dt, N) #Frequency axis

	#Fast Fourier transform
	F = np.fft.fft(f)

	#Calculate amplitude spectrum
	Amp = np.abs(F)

	#graph display
	plt.figure(figsize=(14,6))
	plt.rcParams['font.family'] = 'Times New Roman'
	plt.rcParams['font.size'] = 17
	plt.subplot(121)
	plt.plot(t, f, label='f(n)')
	plt.xlabel("Time", fontsize=20)
	plt.ylabel("Signal", fontsize=20)
	plt.grid()
	leg = plt.legend(loc=1, fontsize=25)
	leg.get_frame().set_alpha(1)
	plt.subplot(122)
	plt.plot(freq, Amp, label='|F(k)|')
	plt.xlabel('Frequency', fontsize=20)
	plt.ylabel('Amplitude', fontsize=20)
	plt.ylim(0,1000)
	plt.grid()
	leg = plt.legend(loc=1, fontsize=25)
	leg.get_frame().set_alpha(1)
	plt.show()





nikkei = "998407.O"
panasonic = "6752.T"
sony = "6758.T"

# code info "https://info.finance.yahoo.co.jp/history/?code="





df = get_stock_df(panasonic,"2016-5-23","2019-5-23")
#graphing(df["closing price"],1)
csv = df.to_csv('./panasonic.csv', encoding='utf_8_sig')




Recommended Posts

Acquisition of time series data (daily) of stock prices
Differentiation of time series data (discrete)
Time series analysis 3 Preprocessing of time series data
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
Automatic acquisition of stock price data with docker-compose
Anomaly detection of time series data by LSTM (Keras)
[Python] Plot time series data
Stock price data acquisition tips
A story about clustering time series data of foreign exchange
Calculation of time series customer loyalty
Python: Time Series Analysis: Preprocessing Time Series Data
About time series data and overfitting
How to extract features of time series data with PySpark Basics
Comparison of time series data predictions between SARIMA and Prophet models
Automatic collection of stock prices using python
[For beginners] Script within 10 lines (5. Resample of time series data using pandas)
Predict time series data with neural network
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
Acquisition of plant growth data Acquisition of data from sensors
[Python] Accelerates loading of time series CSV
Time series analysis 4 Construction of SARIMA model
Time series data anomaly detection for beginners
Plot CSV of time series data with unixtime value in Python (matplotlib)
Conversion of time data in 25 o'clock notation
[Kaggle] I tried feature engineering of multidimensional time series data using tsfresh.
How to handle time series data (implementation)
Reading OpenFOAM time series data and sets data
Get time series data from k-db.com in Python
Memorandum (acquisition / conversion of "user-defined" time, cross tabulation)
Kaggle Kernel Method Summary [Table Time Series Data]
How to read time series data in PyTorch
Stock price forecast using deep learning [Data acquisition]
measurement of time
Time Series Decomposition
[Latest method] Visualization of time series data and extraction of frequent patterns using Pan-Matrix Profile
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
"Getting stock price time series data from k-db.com with Python" Program environment creation memo
Features that can be extracted from time series data
Predict stock prices by big data analysis from past data
Time series data prediction by AutoML (automatic machine learning)
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
How to calculate the sum or average of time series csv data in an instant
What you should not do in the process of time series data analysis (including reflection)