[PYTHON] Get stock price data for S & P 500 constituents from yahoo finance and find highly correlated pairs.

Article summary

Phenomena such as "when stock A goes up, stock B goes up" and "when stock A goes down, stock B goes down" are called "take-up" and "take-down". BNF, a genius trader, used this to buy and sell stocks and made a lot of money. Make a lot of money by imitating yourself! I thought, but I couldn't find a way to find a pair to work with. So, I implemented it in Python, so I will publish the code and its description.

code

I have it on Github. As you want https://github.com/toshiikuoo/puclic/blob/master/%E6%A0%AA%E4%BE%A1%E7%9B%B8%E9%96%A2.ipynb

motion

The flow of operation is as follows

Get stock list information from wikipedia S & P500 page ↓ Acquire the stock price of the acquired stock list from yahoo finance ↓ Calculate the correlation of stock prices for all combinations of stocks * Correlation: A numerical value of how similar the two data are ↓ Sort pairs in descending order of correlation

Code description

I will explain while excerpting the above code.


#Required library import
!pip install lxml html5lib beautifulsoup4

import pandas as pd
from pandas import Series,DataFrame
from pandas_datareader import DataReader

import numpy as np

from datetime import datetime

from scipy.stats.stats import pearsonr
import itertools

# Install yfinance package.

!pip install yfinance
 
# Import yfinance
import yfinance as yf  

# S&P Create a list of all stocks
url="https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
sp500_list=pd.read_html(url)[0].Symbol.values.tolist()
len(sp500_list)
#Store the closing prices of sp500 stocks in one DataFrame
close_sp500_list=yf.download(sp500_list_yahoo,'2019-10-04','2019-11-01')["Adj Close"]
#Calculate correlation with pairs for each column

#Creating a dictionary type to enter the calculated correlation
correlations={}

#Calculate correlation
for cola,colb in itertools.combinations(sp500_list_yahoo,2):
  nas=np.logical_or(np.isnan(close_sp500_list.loc[:,cola]),np.isnan(close_sp500_list.loc[:,colb]))
  try:
    correlations[cola + '__'+ colb]=pearsonr(close_sp500_list.loc[:,cola][~nas],close_sp500_list.loc[:,colb][~nas])
  except ValueError:
    pass    

#Output result"correlations"Is a list format, so convert it to a DataFrame
result=DataFrame.from_dict(correlations,orient='index')
result.columns=['PCC','p-value']
print(result.sort_values('PCC'))

result

The final output is below. The correlation of each stock pair is sorted and output.

           PCC       p-value
BKR__SPGI   -0.968878  1.437804e-03
BIIB__HAS   -0.962712  8.038530e-13
BKR__PGR    -0.959178  2.465597e-03
PGR__WCG    -0.941347  6.818268e-11
CI__PGR     -0.935051  1.840799e-10
...               ...           ...
CNC__WCG     0.996087  1.493074e-22
BKR__PRGO    0.997290  1.101006e-05
CBS__VIAB    0.998546  7.579099e-27
BBT__STI     0.998835  8.266321e-28
GOOGL__GOOG  0.999502  1.701271e-31

[127260 rows x 2 columns]

Future actions

I want to group highly correlated stocks using the output results. (Please feel free to contact me with any questions or improvements. This is my first post, so I think it's strange. Github is difficult ...)

Recommended Posts

Get stock price data for S & P 500 constituents from yahoo finance and find highly correlated pairs.
Get Japanese stock price information from yahoo finance with pandas