[PYTHON] Finding the beginning of Abenomics from NT magnification 2

The trend of NT magnification has changed

In the previous article, from the analysis of the two stock indexes Topix / Nikkei225, especially the regression analysis, the NT plot is largely on the two Trend Lines, and in chronological order, the slope is gentle Trend-1 ( It was found that the NT ratio = 10.06) changed to Trend-2 (NT ratio = 12.81) with a steep slope.

** Figure. Reprint of the above figure (Topix vs. Nikkei225) ** Scatter_NT_02.png

The steep slope of Trend-2 is presumed to be related to "Abenomics" economic policy, but regression analysis did not clarify when it started. This time, we used a machine learning method to classify Trend-1 and Trend-2, and tried to clarify when Trend-2 started.

Trial.1 - K-Means Clustering I decided to use scikit-learn as a Python module for machine learning, but there are various possible approaches to classification, but I first tried using the K-Means method. This is a typical example of clustering performed without a label.

K-Means_not_good.png

Code is as follows.


from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

mypair.dropna(inplace=True)
X = np.column_stack([mypair['topix'].values, mypair['n225'].values])

# K-means clustering process

myinit = np.array([[mypair.loc['20050104', 'topix'], mypair.loc['20050104', 'n225']], \
			[mypair.loc['20130104', 'topix'], mypair.loc['20130104', 'n225']]])

k_means = KMeans(init=myinit, n_clusters=2, n_init=10)
k_means.fit(X)           # ... compute k-means clustering

k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)

colors = ['b', 'r']
n_clusters = 2

for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], 'w', markerfacecolor=col, marker='.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col, markeredgecolor='k', markersize=6)

plt.title('K-Means')
plt.grid(True)

After all it was useless. In the K-Means method, it seems that the method is to measure the (abstract) distance between the data and collect the close ones to your own member, but like this time, the ones with a large aspect ratio scattered along the line are used. It doesn't seem to be suitable for handling.

Trial.2 - Primary Component Anarysis (PCA) Looking at the K-means plot, I wondered if it would be possible to apply some kind of coordinate transformation to make it into a "lump" and then cluster it in order to collect the data scattered linearly. However, when I looked up the documents, I found that principal component analysis (PCA) could be applied, so I decided to try the classification by PCA.

** Figure. Plot ** after PCA processing

PCA_scatter_01.png

From here, we decided to classify into two groups with a boundary line at Y = 0.

# PCA process

pca = PCA(n_components=2)
X_xf = pca.fit(X).transform(X)

plt.scatter(X_xf[:,0], X_xf[:,1])
plt.grid(True)
border_line = np.array([[-6000,0], [6000, 0]])
plt.plot(border_line[:,0], border_line[:,1],'r-', lw=1.6)

col_v = np.zeros(len(X_xf), dtype=int)
for i in range(len(X_xf)):
	col_v[i] = int(X_xf[i,1] / abs(X_xf[i,1])) * (-1)

mypair['color'] = col_v
mypair['color'].plot(figsize=(8,2), grid=True, lw=1.6)    # color historical chart
plt.ylim([-1.2, 1.2])

# plot scatter w/ colors

plt.figure(figsize=(12,5))
plt.subplot(121)
plt.scatter(X_xf[:,0], X_xf[:,1], marker='o', c=col_v)
plt.grid(True)
plt.title('Topix vs. Nikkei225 (PCA processed)')

plt.subplot(122)
plt.scatter(X[:,0], X[:,1], marker='o', c=col_v)
plt.grid(True)
plt.title('Topix vs. Nikkei225 (raw values)')

The results are shown in the figure below. (Sorry, the "color" is hard to see.)

PCA_scatter_03.png

The left side is a color-coded one in the coordinate system converted by PCA, and a plot of this color in the original coordinate system. It can be confirmed that Trends are grouped as originally intended. Setting Y = 0 as the boundary line of the group seems to be "well" valid.

Summary

By PCA, we were able to classify into a blue plot group with a gentle gradient and a red plot group with a steep gradient. Let's make the series data of this color a Historical Chart.

** Figure. Trend (color) transition (y = -1: Trend-1, y = + 1: Trend-2) ** PCA_color-TL.png

From the chart above, ** Trend-1 ** until the latter half of 2009, then there is a slight transition period, and from the second quarter of 2011, the NT magnification is large ** Trend-2 **. It can be seen that it continues until 2014. If ** Trend-2 ** = "Abenomics", it can be inferred that Abenomics started in the first half of 2011. (The first half of 2011 reminds me of the Great East Japan Earthquake.)

I would like to consider the application of other machine learning methods and the verification of this PCA method in the future. Also, when other economic data (for example, fossil fuel imports) are available, I would like to investigate the relationship with them.

References

--Data Scientist Training Reader (Technical Review) http://gihyo.jp/book/2013/978-4-7741-5896-9 7741-5896-9)

Recommended Posts

Finding the beginning of Abenomics from NT magnification 2
Finding the beginning of Abenomics from NT magnification 1
Learning notes from the beginning of Python 1
Omit BOM from the beginning of the string
Learning notes from the beginning of Python 2
The beginning of cif2cell
Learn Nim with Python (from the beginning of the year).
Study from the beginning of Python Hour1: Hello World
Study from the beginning of Python Hour8: Using packages
DJango Note: From the beginning (simplification and splitting of URLConf)
First Python 3 ~ The beginning of repetition ~
DJango Memo: From the beginning (preparation)
Carefully derive the interquartile range of the standard normal distribution from the beginning
DJango Memo: From the beginning (model settings)
[Understanding in 3 minutes] The beginning of Linux
Shout Hello, Reiwa! At the beginning of Reiwa
DJango Note: From the beginning (form processing)
DJango Memo: From the beginning (creating a view)
Change the decimal point of logging from, to.
The story of finding the optimal n in N fist
Extract only complete from the result of Trinity
DJango Memo: From the beginning (Error screen settings)
From the introduction of pyethapp to the execution of contract
The transition of baseball as seen from the data
The story of moving from Pipenv to Poetry
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
The story of launching a Minecraft server from Discord
The wall of changing the Django service from Python 2.7 to Python 3
Used from the introduction of Node.js in WSL environment
Calculate volume from the two-dimensional structure of a compound
[GoLang] Set a space at the beginning of the comment
[Python] Get the text of the law from the e-GOV Law API
Open Chrome version of LINE from the command line [Linux]
Calculation of the minimum required number of votes from turnout
Kaggle competition process from the perspective of score transitions
The idea of Tensorflow learned from potato chip manufacturing
Get the return code of the Python script from bat
DJango Note: From the beginning (using a generic view)
DJango Note: From the beginning (creating a view from a template)
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Visualize the number of complaints from life insurance companies
Find the "minimum passing score" from the "average score of examinees", "average score of successful applicants", and "magnification" of the entrance examination