1. Economic data "regime"

When a sudden and big change occurs in the economy such as the collapse of the IT bubble or the Lehman shock, it may be considered that some state has changed fundamentally in the background. Such a fundamental state change model is called a "regime switching model". Modeling a regime switch can be difficult if the changed state is unobservable. However, modeling is possible by giving the probability distribution of the regime with parameters and treating the transition between regimes as a Markov chain. The parameters that maximize the likelihood (mean / standard deviation of the normal distribution, regime transition probability) can be obtained by MCMC.

2. Explanation of Markov switching model

Linear regression with regime

Consider the regime switching model of the following linear regression model.

\begin{eqnarray}
y\left( t \right) &=& x\left( t \right)\beta + \epsilon\left( t \right)\\
\epsilon\left( t \right)&\sim& N\left(0,\sigma^2\right)
\end{eqnarray}

The $ 1 $ to $ n $ regimes at each time $ t $ are expressed as follows.

\begin{eqnarray}
I\left(t\right) \in { 1,2,\cdots ,n}
\end{eqnarray}

If the regime can be observed, the regression equation is expressed as follows with the regime as a subscript.

\begin{eqnarray}
y\left( t \right) &=& x\left( t \right)\beta_{I\left( t\right)} + \epsilon\left( t \right)\\
\epsilon\left( t \right)&\sim& N\left(0,\sigma^2_{I\left( t\right)}\right)
\end{eqnarray}

At this time, $ y \ left (t \ right) $ is average $ x \ left (t \ right) \ beta_ {I \ left (t \ right)} $, variance $ \ sigma ^ 2_ {I \ left ( It follows a normal distribution of t \ right)} $. Apply the maximum likelihood method that maximizes the log-likelihood. That is, the parameter that maximizes the following likelihood function is obtained.

\begin{eqnarray}
\ln{\mathcal{L}}&=&\sum_{t=1}^{T}\ln{\{f\left( y\left( t\right)|I\left( t\right)\right)\} }\\
f\left( y\left( t\right)|I\left( t\right)\right)&=&\frac{1}{\sqrt{2\pi\sigma^2_{I\left( t\right)}}}\exp\left( -\frac{\{y\left( t\right)-x\left( t\right)\beta_{I\left( t \right)}\}^2}{2\sigma^2_{I\left( t\right)}}\right)
\end{eqnarray}

If the regime cannot be observed directly, then $ I \ left (t \ right) $ is also considered to follow the conditional probability distribution of the past history. Let $ \ mathcal {F} \ left (t \ right) $ be the information obtained by the time $ t $. Note that $ f \ left (t \ right) $ is not conditioned on $ \ mathcal {F} \ left (t \ right) $.

\begin{eqnarray}
f\left( y\left( t\right) | \mathcal{F\left( t-1\right)}\right) &=& \sum_{I\left( t\right)=1}^{n}f\left( y\left( t\right),I\left( t\right) |\mathcal{F}\left( t-1\right)\right)\\
&=& \sum_{I\left( t\right)=1}^{n}f\left( y\left( t\right)|I\left( t\right),\mathcal{F}\left( t-1\right)\right)f\left( I\left( t\right)|\mathcal{F}\left( t-1\right)\right)\\
&=& \sum_{I\left( t\right)=1}^{n}\frac{1}{\sqrt{2\pi\sigma^2_{I\left( t\right)}}}\exp\left( -\frac{\{y\left( t\right)-x\left( t\right)\beta_{I\left( t \right)}\}^2}{2\sigma^2_{I\left( t\right)}}\right)
f\left( I\left( t\right)|\mathcal{F}\left( t-1\right)\right)
\end{eqnarray}

Therefore, the log-likelihood to be maximized is as follows.

\begin{eqnarray}
\ln{\mathcal{L}}&=&\sum_{t=1}^{T}\ln{\left\{\sum_{I\left( t\right)=1}^{n}f\left( y\left( t\right)|I\left( t\right),\mathcal{F}\left( t-1\right)\right)f\left( I\left( t\right)|\mathcal{F}\left( t-1\right)\right)\right\} }\\

f\left( y\left( t\right)|I\left( t\right),\mathcal{F}\left( t-1\right)\right)&=&\frac{1}{\sqrt{2\pi\sigma^2_{I\left( t\right)}}}\exp\left( -\frac{\{y\left( t\right)-x\left( t\right)\beta_{I\left( t \right)}\}^2}{2\sigma^2_{I\left( t\right)}}\right)
\end{eqnarray}

Markov chain

Probability distribution of $ y \ left (t \ right) $ by giving $ f \ left (I \ left (t \ right) | \ mathcal {F} \ left (t-1 \ right) \ right) $ Is decided. Suppose $ I \ left (t \ right) $ satisfies first-order Markov property.

\begin{eqnarray}
P\left( I\left( t\right)=i|\mathcal{F}\left( t-1\right)\right)
&=&\sum_{j=1}^{n}P\left( I\left( t\right)=i,I\left( t-1\right)=j|\mathcal{F}\left( t-1\right)\right)\\
&=&\sum_{j=1}^{n}P\left( I\left( t\right)=i|I\left( t-1\right)=j\right)P\left( I\left( t-1\right)=j|\mathcal{F}\left( t-1\right)\right)
\end{eqnarray}

Transition probability from state j to state i $ P \ left (I \ left (t \ right) = i | I \ left (t-1 \ right) = j \ right) $ is $ p_ {i, j} $ And consider the following transition probability matrix.

\begin{eqnarray}
P &=&
\begin{pmatrix}
p_{1,1}&p_{1,2}&\cdots&p_{1,n}\\
p_{2,1}&\ddots&&\vdots\\
\vdots&&\ddots&\vdots\\
p_{n,1}&\cdots&\cdots&p_{n,n}
\end{pmatrix}
\end{eqnarray}

Conditional probability of regime $ P \ left (I \ left (t \ right) = j | \ mathcal {F} \ left (t \ right) \ right) given information up to time $ t $ $ Is calculated by Bayesian update.

\begin{eqnarray}
P\left( I\left( t\right)=i|\mathcal{F}\left( t\right)\right)
&=&P\left( I\left( t\right)=i|\mathcal{F}\left( t-1\right), y\left( t\right)\right)\\
&=&\frac{f\left( I\left( t\right) =i,y\left( t\right)|\mathcal{F}\left( t-1\right)\right)}{f\left(y\left( t\right)|\mathcal{F}\left( t-1\right)\right)}\\
&=&\frac{f\left(y\left( t\right)|I\left( t\right)=i,\mathcal{F}\left( t-1\right)\right)f\left(I\left( t\right)=i|\mathcal{F}\left( t-1\right)\right)}{\sum_{i=1}^{n}f\left(y\left( t\right)|I\left( t\right)=i,\mathcal{F}\left( t-1\right)\right)f\left(I\left( t\right)=i|\mathcal{F}\left( t-1\right)\right)}\\
&=&\frac{\sum_{j=1}^{n}f\left(y\left( t\right)|I\left( t\right)=j,\mathcal{F}\left( t-1\right)\right)p_{i,j}f\left(I\left( t-1\right)=j|\mathcal{F}\left( t-1\right)\right)}{\sum_{i=1}^{n}\sum_{j=1}^{n}f\left(y\left( t\right)|I\left( t\right)=j,\mathcal{F}\left( t-1\right)\right)p_{i,j}f\left(I\left( t-1\right)=i|\mathcal{F}\left( t-1\right)\right)}
\end{eqnarray}

The initial probability is the stationary probability of the Markov chain. It is known that the transition probability matrix $ P $, the identity matrix $ I $, and the vector $ 1 $ with all elements of $ 1 $ are used, and the steady-state probability $ \ pi ^ {\ ast} $ is obtained as follows.

\begin{eqnarray}
A &=& 
\begin{pmatrix}
I-P\\
1^{\mathrm{T}}
\end{pmatrix}\\

\pi^{\ast}&=&\left(A^{\mathrm{T}}A\right)^{-1}A^{\mathrm{T}}M+1st row
\end{eqnarray}

Markov chain Monte Carlo method

The Markov chain Monte Carlo method (MCMC) is used as a method for finding the parameters that maximize the log-likelihood. The explanation is omitted here.

The proposed parameters must meet the regime-distinguishing conditions. For example, when distinguishing between low-risk and high-risk regimes, the proposed $ \ sigma_1 $ and $ \ sigma_2 $ must satisfy $ \ sigma_1 <\ sigma_2 $. Note that this condition must be specified for each individual model.

3. Implementation of Markov switching model by Python

Implement Markov switching for any number of regimes.

import numpy as np
import pandas as pd
from scipy import stats

import matplotlib.pyplot as plt
import japanize_matplotlib
%matplotlib inline

from tqdm.notebook import tqdm

regime = 3

#Initial parameters of the normal distribution that each regime follows
mu = [-1.4,-0.6,0]
sigma = [0.07,0.03,0.04]

Instead of estimating the transition probability of the regime itself by MCMC, the transition probability is used as a logistic function value and its arguments are estimated by MCMC. At that time, a logistics function is applied to the off-diagonal component, and the diagonal component generates a transition probability matrix as a complementary event. Treat the argument of the logistics function as gen_prob and the logistic function value as prob separately.

gen_prob = np.ones((regime,regime))*(-3)
gen_prob = gen_prob - np.diag(np.diag(gen_prob))
#Diagonal component is 0, off-diagonal component is-Square matrix of 3

prob = np.exp(gen_prob)/(1+np.exp(gen_prob))
#Apply logistic function

prob = prob + np.diag(1-np.dot(prob.T,np.ones((regime,1))).flatten())
#As a complementary event of the off-diagonal component, the probability of the diagonal component is obtained and used as a transition probability matrix.

#Function to find the steady-state probability of Markov chain
def cal_stationary_prob(prob, regime):
    # prob: 2-d array, shape = (regime, regime),Transition probability matrix
    # regime: int,Number of regimes
    
    A = np.ones((regime+1, regime))
    A[:regime, :regime] = np.eye(regime)-prob
    
    return np.dot(np.linalg.inv(np.dot(A.T,A)),A.T)[:,-1]

#Function to calculate log-likelihood
def cal_logL(x, mu, sigma, prob, regime):
    # x: 1-d array or pandas Series,Data in chronological order
    # mu: 1=d array, len = regime,Initial value of the mean of the normal distribution that each regime follows
    # sigma: 1=d array, len = regime,Initial value of the variance of the normal distribution that each regime follows
    # prob: 2-d array, shape = (regime, regime),Transition probability matrix
    # regime: int,Number of regimes
    
    likelihood = stats.norm.pdf(x=x, loc=mu[0], scale=np.sqrt(sigma[0])).reshape(-1,1)
    for i in range(1,regime):
        likelihood = np.hstack([likelihood, stats.norm.pdf(x=x, loc=mu[i], scale=np.sqrt(sigma[i])).reshape(-1,1)])
    
    prior = cal_stationary_prob(prob, regime)
    
    logL = 0
    for i in range(length):
        temp = likelihood[i]*prior
        sum_temp = sum(temp)

        posterior = temp/sum_temp

        logL += np.log(sum_temp)
        
        prior = np.dot(prob, posterior)
        
    return logL

#A function that calculates the probability that each time point belongs to each regime
def prob_regime(x, mu, sigma, prob, regime):
    # x: 1-d array or pandas Series,Data in chronological order
    # mu: 1=d array, len = regime,Initial value of the mean of the normal distribution that each regime follows
    # sigma: 1=d array, len = regime,Initial value of the variance of the normal distribution that each regime follows
    # prob: 2-d array, shape = (regime, regime),Transition probability matrix
    # regime: int,Number of regimes
    
    likelihood = stats.norm.pdf(x=x, loc=mu[0], scale=np.sqrt(sigma[0])).reshape(-1,1)
    for i in range(1,regime):
        likelihood = np.hstack([likelihood, stats.norm.pdf(x=x, loc=mu[i], scale=np.sqrt(sigma[i])).reshape(-1,1)])
    
    prior = cal_stationary_prob(prob, regime)
    
    prob_list = []
    for i in range(length):
        temp = likelihood[i]*prior

        posterior = temp/sum(temp)
        
        prob_list.append(posterior)
        
        prior = np.dot(prob, posterior)
    
    return np.array(prob_list)

#Function to generate parameters to update in MCMC
def create_next_theta(mu, sigma, gen_prob, epsilon, regime):
    # mu: 1=d array, len = regime,Initial value of the mean of the normal distribution that each regime follows
    # sigma: 1=d array, len = regime,Initial value of the variance of the normal distribution that each regime follows
    # gen_prob: 2-d array, shape = (regime, regime),Logistic function arguments
    # epsilon: float,Update width of the proposed parameter
    # regime: int,Number of regimes
    
    new_mu = mu.copy()
    new_sigma = sigma.copy()
    new_gen_prob = gen_prob.copy()
    
    new_mu += (2*np.random.rand(regime)-1)*epsilon
    new_mu = np.sort(new_mu)
    new_sigma = np.exp(np.log(new_sigma) + (2*np.random.rand(regime)-1)*epsilon)
    new_sigma = np.sort(new_sigma)[[2,0,1]]
    new_gen_prob += (2*np.random.rand(regime,regime)-1)*epsilon*0.1

    new_gen_prob = new_gen_prob - np.diag(np.diag(new_gen_prob))
    new_prob = np.exp(new_gen_prob)/(1+np.exp(new_gen_prob))
    new_prob = new_prob + np.diag(1-np.dot(new_prob.T,np.ones((regime,1))).flatten())
    
    return new_mu, new_sigma, new_gen_prob, new_prob

#Function that executes MCMC
def mcmc(x, mu, sigma, gen_prob, prob, epsilon, trial, regime):
    # x: 1-d array or pandas Series,Data in chronological order
    # mu: 1=d array, len = regime,Initial value of the mean of the normal distribution that each regime follows
    # sigma: 1=d array, len = regime,Initial value of the variance of the normal distribution that each regime follows
    # gen_prob: 2-d array, shape = (regime, regime),Logistic function arguments
    # epsilon: float,Update width of the proposed parameter
    # trial: int,MCMC execution count
    # regime: int,Number of regimes
    
    mu_list = []
    sigma_list = []
    prob_list = []
    logL_list = []
    mu_list.append(mu)
    sigma_list.append(sigma)
    prob_list.append(prob)
    
    for i in tqdm(range(trial)):
        new_mu, new_sigma, new_gen_prob, new_prob = create_next_theta(mu, sigma, gen_prob, epsilon, regime)
        
        logL = cal_logL(x, mu, sigma, prob, regime)
        next_logL = cal_logL(x, new_mu, new_sigma, new_prob, regime)
        
        ratio = np.exp(next_logL-logL)
        logL_list.append(logL)

        if ratio > 1:
            mu, sigma, gen_prob, prob = new_mu, new_sigma, new_gen_prob, new_prob

        elif ratio > np.random.rand():
            mu, sigma, gen_prob, prob = new_mu, new_sigma, new_gen_prob, new_prob

        mu_list.append(mu)
        sigma_list.append(sigma)
        prob_list.append(prob)
            
        if i%100==0:
            print(logL)
    
    return np.array(mu_list), np.array(sigma_list), np.array(prob_list), np.array(logL_list)

4. Data example

TEPCO's excess rate of return

The period is from 2003 to 2019. The market price was topix, and the risk-free assets were Japanese government bonds (10 years).

The time series data of the excess rate of return is Pandas DataFramex.

regime = 3

mu = [-1.4,-0.6,0]
sigma = [0.07,0.03,0.04]

gen_prob = np.ones((regime,regime))*(-3)
gen_prob = gen_prob - np.diag(np.diag(gen_prob))
prob = np.exp(gen_prob)/(1+np.exp(gen_prob))
prob = prob + np.diag(1-np.dot(prob.T,np.ones((regime,1))).flatten())

trial = 25000
epsilon = 0.1

mu_list, sigma_list, prob_list, logL_list = mcmc(x, mu, sigma, gen_prob, prob, epsilon, trial, regime)

prob_series = prob_regime(x, mu_list[-1], sigma_list[-1], prob_list[-1], regime)

Visualize the probabilities of each regime as a stacked bar graph.

fig = plt.figure(figsize=(10,5),dpi=200)
ax1 = fig.add_subplot(111)
ax1.plot(range(length), x, linewidth = 1.0, label="return")

ax2 = ax1.twinx()
for i in range(regime):
    ax2.bar(range(length), prob_series[:,i], width=1.0, alpha=0.4, label=f"regime{i+1}", bottom=prob_series[:,:i].sum(axis=1))

h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend(h1+h2, l1+l2, bbox_to_anchor=(1.05, 1), loc='upper left')

ax1.set_xlabel('t')
ax1.set_ylabel(r'Excess rate of return')
ax2.set_ylabel(r'probability')
ax1.set_title(f"Probability of TEPCO's highly dispersed regime, epsilon={epsilon}, trial={trial}")

plt.subplots_adjust(left = 0.1, right = 0.8)
#plt.savefig("probability_regime2.png ")
plt.show()

In the graph below, the high-dispersion low-return regime is blue, the low-dispersion low-return regime is yellow, and the medium-dispersion high-return regime is green.

The graph below plots the actual stock price, not the rate of return. It can be seen that the stock market crash caused by the nuclear accident after the Great East Japan Earthquake is presumed to be a jump rather than a regime switch.

References

--Takahiro Komatsu (2018), * Optimal Investment Strategy: Theory and Practice of Portfolio Technology *, Tokyo: Asakura Shoten --Tatsuyoshi Okimoto (2014), Application of Markov Switching Model to Macroeconomics and Finance, * Journal of Japan Statistical Society *, 44 (1), 137-157