[PYTHON] Note that the calculation of average pairwise correlation was very easy with pandas

What is average pairwise correlation

I don't know how many people in the world need to do this calculation.

The average pairwise stock correlation is the average of the correlation coefficients of price movements between stocks. In general, the environment is such that the market as a whole drops sharply during a crisis such as the Lehman shock or the European debt crisis, and then rises sharply in reaction, and the correlation coefficient between stocks increases. When the correlation jumps, all stocks behave in the same way, creating a market environment where it is difficult for active equity investors to earn excess returns by stock selection. So, you can use this indicator to adjust the risk level of your portfolio (such as reducing the risk when you are unlikely to win). As a more sloppy example, active stock managers use it as an excuse to say, "I haven't won recently, but in this market environment, please forgive me."

So easy with pandas

First, prepare the data. Prepare a DataFrame (m x n) with the date vertically (assumed to be m days), the stocks horizontally (assumed to be n), and the daily returns of stocks in each element. This time, I prepared the data outside and read it with csv. This article, download the stock price from yahoo using pandas, and use the pct_change () method for daily returns. You may fix it.

And finally I will calculate the correlation coefficient,

#Calculation of pairwise correlation(The result is Panel)
result = df.rolling(window=60, min_periods=30).corr()

Calculation of rolling pairwise correlation is completed in this one line! What we are doing is "Calculate a correlation matrix (nxn matrix) that takes the correlation coefficient between all stocks for 60 days of daily returns up to that day. However, for stocks that do not have data for 30 days, None Repeat this for all dates. " The resulting result is a pandas.Panel object that is a 2D DataFrame with an additional axis (time axis) to make it 3D (m x n x n).

I was able to calculate the correlation, but it takes some work to get the average. Since this is a correlation matrix, 1 is included in the diagonal component, and it is necessary to take the average without this. 2) If the correlation coefficient cannot be calculated due to data loss, the None value is included. There are two reasons why you need to take the average over this.

First of all, it is the diagonal component 1, but I will change it to the None value usingnp.fill_diagonal ()of Numpy. For a single DataFrame we use something like np.fill_diagonal (df, None), but this time we use the ʻapply ()method to apply it to the entirePanel` as follows:

#Convert diagonal components to None
tmp = result.apply(lambda x: np.fill_diagonal(x.values, None), axis=(1,2)) 

Then apply the mean (skipna = True) method to calculate the mean value, ignoring the None value. This is the average pairwise correlation. This is also applied over the time axis with a single ʻapply ()` method.

#Ignore None and calculate the average
apc = result.apply(lambda x: x.unstack().mean(skipna=True), axis=(1,2))

If you set Panel.apply (..., axis = (1,2)), you can process the correlation matrix at each time point as DataFrame x while moving the time axis.

The calculation is complete! Let's plot it.

apc.plot()

correlation.png

You can see that the correlation jumps at the time when the market is influenced by the occasional macro factors.

in conclusion

That's all there is to putting the code together. It's easy. pandas is only developed by the author Wes McKinney while he was working for hedge fund AQR Capital, making it very easy to handle financial data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Data read
df = pd.read_csv('data.csv', na_values=' ', index_col=0, parse_dates=True)
#Calculation of pairwise correlation(The result is Panel)
result = df.rolling(window=60, min_periods=30).corr()
#Convert diagonal components to None
tmp = result.apply(lambda x: np.fill_diagonal(x.values, None), axis=(1,2)) 
#Ignore None and calculate the average
apc = result.apply(lambda x: x.unstack().mean(skipna=True), axis=(1,2))
#Plot
apc.plot()

References

--Pandas documentation. The rolling method has changed since pandas 0.18.0, so please update to the latest pandas before using it. http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions

Recommended Posts

Note that the calculation of average pairwise correlation was very easy with pandas
Plot the Nikkei Stock Average with pandas
I compared the moving average of IIR filter type with pandas and scipy
Note: Prepare the environment of CmdStanPy with docker
[Note] Export the html of the site with python.
Here is one of the apps with "artificial intelligence" that I was interested in.
Find the sum of unique values with pandas crosstab
Make a note of the list of basic Pandas usage
Note that the latest link of ius has changed
The story that the return value of tape.gradient () was None
The story that Japanese output was confused with Django
Extract the maximum value with pandas and change that value
I tried to find the average of the sequence with TensorFlow
Reformat the timeline of the pandas time series plot with matplotlib
Basic calculation of pandas to enjoy Hakone Ekiden while competing with the best members of all time
Note that I was addicted to accessing the DB with Python's mysql.connector using a web application.