[PYTHON] Analysis of financial data by pandas and its visualization (1)

Application to financial and economic fields

pandas is a data tool that began to be developed in the financial field around 2008. The author, Wes McKinney, was a member of the well-known financial hedge fund AQR Capital Management. .. For that reason, it has a number of powerful functions even when viewed as a practical analysis tool for financial and economic data.

We will analyze the dataset obtained from Yahoo! Finance using pandas. This time we'll use some stock data and daily closing prices for the S & P 500 Index (whose identifier is SPX).

Reading and writing datasets

pandas has functions for input and output such as CSV and JSON.

function Description
read_csv ','Read delimited data
read_table tab('\t')Read delimited data
read_json Read JSON format data
read_msgpack Read data in msgpack format
read_pickle Read binary data

The to_XXX function that is paired with these is provided in the data frame, and data can be output in any format. It's very easy to not have to call the CSV or JSON Parser to write the code.

import pandas as pd
stock = pd.read_csv('stock_px.csv', parse_dates=True, index_col=0)

Moreover, an index is automatically created for the data read from CSV. You can also recreate a new object with a new index that is more suitable.

Another feature of pandas is that it handles missing values well. It is not always possible to handle clean, flawless data in data analysis. So all pandas object stats exclude missing values. You can also set a threshold for how much missing values are allowed and fill in the blanks with the specified values.

Data set handling

Finding and aggregating summary statistics and grouping by index level is very easy.

stock.head(10) #Show only the first 10
# =>
#             AAPL   MSFT    XOM     SPX
# 2003-01-02  7.40  21.11  29.22  909.03
# 2003-01-03  7.45  21.14  29.24  908.59
# 2003-01-06  7.45  21.52  29.96  929.01
# 2003-01-07  7.43  21.93  28.95  922.93
# 2003-01-08  7.28  21.31  28.83  909.93
# 2003-01-09  7.34  21.93  29.44  927.57
# 2003-01-10  7.36  21.97  29.03  927.57
# 2003-01-13  7.32  22.16  28.91  926.26
# 2003-01-14  7.30  22.39  29.17  931.66
# 2003-01-15  7.22  22.11  28.77  918.22

stock['AAPL'].sum() #total
# => 277892.75

stock['AAPL'].mean() #Arithmetic mean
# => 125.51614724480578

stock['AAPL'].median() #Median
# => 91.45500000000001

Calculation of weighted average

Let's find out how much there is a correlation between daily profit and SPX in the year.

rets = stock.pct_change().dropna()
spx_corr = lambda x: x.corrwith(x['SPX'])
stock_by_year = rets.groupby(lambda x: x.year)

result_1 = stock_by_year.apply(spx_corr) #Correlation between daily profits and SPX
print( result_1 )
# =>          AAPL      MSFT       XOM  SPX
#   2003  0.541124  0.745174  0.661265    1
#   2004  0.374283  0.588531  0.557742    1
#   2005  0.467540  0.562374  0.631010    1
#   2006  0.428267  0.406126  0.518514    1
#   2007  0.508118  0.658770  0.786264    1
#   2008  0.681434  0.804626  0.828303    1
#   2009  0.707103  0.654902  0.797921    1
#   2010  0.710105  0.730118  0.839057    1
#   2011  0.691931  0.800996  0.859975    1

plt.figure() #Canvas drawing
result_1.plot() #Plot with matplotlib
plt.show()
plt.savefig("image.png ")

image.png

Find the correlation between columns.

result_2 = stock_by_year.apply(lambda g: g['AAPL'].corr(g['MSFT'])) #Correlation between Apple and Microsoft
print( result_2 )
# =>
# 2003    0.480868
# 2004    0.259024
# 2005    0.300093
# 2006    0.161735
# 2007    0.417738
# 2008    0.611901
# 2009    0.432738
# 2010    0.571946
# 2011    0.581987

plt.figure()
result_2.plot()
plt.show()
plt.savefig("image2.png ")

image2.png

Linear regression

Find the linear regression of the data by the least squares method.

def regression(data, yvar, xvars):
    Y = data[yvar]
    X = data[xvars]
    X['intercept'] = 1.
    result = sm.OLS(Y, X).fit()
    return result.params

result_3 = stock_by_year.apply(regression, 'AAPL', ['SPX'])
print(result_3)
# =>         SPX  intercept
# 2003  1.195406   0.000710
# 2004  1.363463   0.004201
# 2005  1.766415   0.003246
# 2006  1.645496   0.000080
# 2007  1.198761   0.003438
# 2008  0.968016  -0.001110
# 2009  0.879103   0.002954
# 2010  1.052608   0.001261
# 2011  0.806605   0.001514

plt.figure()
result_3.plot()
plt.show()
plt.savefig("image3.png ")

image3.png

reference

Introduction to data analysis with Python-Data processing using NumPy and pandas http://www.oreilly.co.jp/books/9784873116556/

Recommended Posts

Analysis of financial data by pandas and its visualization (2)
Analysis of financial data by pandas and its visualization (1)
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
Visualization of data by prefecture
Visualization method of data by explanatory variable and objective variable
Starbucks Twitter Data Location Visualization and Analysis
Implement "Data Visualization Design # 3" with pandas and matplotlib
Calculation of technical indicators by TA-Lib and pandas
Sentiment analysis of large-scale tweet data by NLTK
Data visualization with pandas
Overview and tips of seaborn with statistical data visualization
Story of image analysis of PDF file and data extraction
[Control engineering] Visualization and analysis of PID control and step response
Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-
Overview of natural language processing and its data preprocessing
Visualization memo by pandas, seaborn
Data analysis using python pandas
Negative / positive judgment of sentences and visualization of grounds by Transformer
Negative / positive judgment of sentences by BERT and visualization of grounds
Beginning of Nico Nico Pedia analysis ~ JSON and touch the provided data ~
First satellite data analysis by Tellus
Recommended books and sources of data analysis programming (Python or R)
Data visualization method using matplotlib (+ pandas) (5)
Automatic acquisition of gene expression level data by python and R
Scientific / technical calculation by Python] Drawing and visualization of 3D isosurface and its cross section using mayavi
Data visualization method using matplotlib (+ pandas) (3)
Impressions of touching Dash, a data visualization tool made by python
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Clash of Clans and image analysis (3)
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data analysis starting with python (data visualization 1)
Data visualization method using matplotlib (+ pandas) (4)
Data analysis starting with python (data visualization 2)
Aggregation and visualization of accumulated numbers
Data handling 2 Analysis of various data formats
Graph the ratio of topcoder, Codeforces and TOEIC by rating (Pandas + seaborn)
Preprocessing of Wikipedia dump files and word-separation of large amounts of data by MeCab
Visualization of latitude / longitude coordinate data (assuming meteorological data) using cartopy and matplotlib
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 3: Data Exploration and Visualization
Summary of probability distributions that often appear in statistics and data analysis
Python visualization tool for data analysis work
Import of japandas with pandas 1.0 and above
A little scrutiny of pandas 1.0 and dask
Example of 3D skeleton analysis by Python
Regression model and its visualization using scikit-learn
Correlation visualization of features and objective variables
Pandas of the beginner, by the beginner, for the beginner [Python]
Separation of design and data in matplotlib
Recommendation of Altair! Data visualization with Python
Analysis of X-ray microtomography image by Python
Example of efficient data processing with PANDAS
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 1
[Python] Comparison of Principal Component Analysis Theory and Implementation by Python (PCA, Kernel PCA, 2DPCA)
Image analysis was easy using the data and API provided by Microsoft COCO.
Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 2
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 1: Data analysis)
The first step to log analysis (how to format and put log data in Pandas)
Practice of creating a data analysis platform with BigQuery and Cloud DataFlow (data processing)