Application to financial and economic fields

pandas is a data tool that began to be developed in the financial field around 2008. The author, Wes McKinney, was a member of the well-known financial hedge fund AQR Capital Management. .. For that reason, it has a number of powerful functions even when viewed as a practical analysis tool for financial and economic data.

We will analyze the dataset obtained from Yahoo! Finance using pandas. This time we'll use some stock data and daily closing prices for the S & P 500 Index (whose identifier is SPX).

Reading and writing datasets

pandas has functions for input and output such as CSV and JSON.

function	Description
read_csv	','Read delimited data
read_table	tab('\t')Read delimited data
read_json	Read JSON format data
read_msgpack	Read data in msgpack format
read_pickle	Read binary data

The to_XXX function that is paired with these is provided in the data frame, and data can be output in any format. It's very easy to not have to call the CSV or JSON Parser to write the code.

import pandas as pd
stock = pd.read_csv('stock_px.csv', parse_dates=True, index_col=0)

Moreover, an index is automatically created for the data read from CSV. You can also recreate a new object with a new index that is more suitable.

Another feature of pandas is that it handles missing values well. It is not always possible to handle clean, flawless data in data analysis. So all pandas object stats exclude missing values. You can also set a threshold for how much missing values are allowed and fill in the blanks with the specified values.

Data set handling

Finding and aggregating summary statistics and grouping by index level is very easy.

stock.head(10) #Show only the first 10
# =>
#             AAPL   MSFT    XOM     SPX
# 2003-01-02  7.40  21.11  29.22  909.03
# 2003-01-03  7.45  21.14  29.24  908.59
# 2003-01-06  7.45  21.52  29.96  929.01
# 2003-01-07  7.43  21.93  28.95  922.93
# 2003-01-08  7.28  21.31  28.83  909.93
# 2003-01-09  7.34  21.93  29.44  927.57
# 2003-01-10  7.36  21.97  29.03  927.57
# 2003-01-13  7.32  22.16  28.91  926.26
# 2003-01-14  7.30  22.39  29.17  931.66
# 2003-01-15  7.22  22.11  28.77  918.22

stock['AAPL'].sum() #total
# => 277892.75

stock['AAPL'].mean() #Arithmetic mean
# => 125.51614724480578

stock['AAPL'].median() #Median
# => 91.45500000000001

Calculation of weighted average

Let's find out how much there is a correlation between daily profit and SPX in the year.

rets = stock.pct_change().dropna()
spx_corr = lambda x: x.corrwith(x['SPX'])
stock_by_year = rets.groupby(lambda x: x.year)

result_1 = stock_by_year.apply(spx_corr) #Correlation between daily profits and SPX
print( result_1 )
# =>          AAPL      MSFT       XOM  SPX
#   2003  0.541124  0.745174  0.661265    1
#   2004  0.374283  0.588531  0.557742    1
#   2005  0.467540  0.562374  0.631010    1
#   2006  0.428267  0.406126  0.518514    1
#   2007  0.508118  0.658770  0.786264    1
#   2008  0.681434  0.804626  0.828303    1
#   2009  0.707103  0.654902  0.797921    1
#   2010  0.710105  0.730118  0.839057    1
#   2011  0.691931  0.800996  0.859975    1

plt.figure() #Canvas drawing
result_1.plot() #Plot with matplotlib
plt.show()
plt.savefig("image.png ")

Find the correlation between columns.

result_2 = stock_by_year.apply(lambda g: g['AAPL'].corr(g['MSFT'])) #Correlation between Apple and Microsoft
print( result_2 )
# =>
# 2003    0.480868
# 2004    0.259024
# 2005    0.300093
# 2006    0.161735
# 2007    0.417738
# 2008    0.611901
# 2009    0.432738
# 2010    0.571946
# 2011    0.581987

plt.figure()
result_2.plot()
plt.show()
plt.savefig("image2.png ")

Linear regression

Find the linear regression of the data by the least squares method.

def regression(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params

result_3 = stock_by_year.apply(regression, 'AAPL', ['SPX'])
print(result_3)
# =>         SPX  intercept
# 2003  1.195406   0.000710
# 2004  1.363463   0.004201
# 2005  1.766415   0.003246
# 2006  1.645496   0.000080
# 2007  1.198761   0.003438
# 2008  0.968016  -0.001110
# 2009  0.879103   0.002954
# 2010  1.052608   0.001261
# 2011  0.806605   0.001514

plt.figure()
result_3.plot()
plt.show()
plt.savefig("image3.png ")

reference

Introduction to data analysis with Python-Data processing using NumPy and pandas http://www.oreilly.co.jp/books/9784873116556/