Understanding with mathematical formulas and Python LiNGAM (ICA version)

I am a beginner in causal reasoning.

Recently, I had the opportunity to learn the latest method of statistical causal search, but I realized that if I didn't understand the basic method ** LiNGAM **, I couldn't talk about it at all. I will try to understand it by reading it and following the process in Python one by one. There are two approaches to estimating ** LiNGAM **, but this time we will use the independent component analysis (ICA) approach.

For implementation, a book on statistical causal exploration (machine learning professional series) [1], a paper on LiNGAM [2], and code on the original Github ) Was used as a reference.

What you can do with LiNGAM

For example, suppose you have $ N $ observations with four observation variables $ x_1 $, $ x_2 $, $ x_3 $, $ x_4 $. From that data, we derive a causal graph that is linear, non-circular, and whose errors follow a non-Gaussian distribution (see below). 5b99b3d4-af40-44c5-b170-8c8a250ec650.png

LiNGAM model

The formula for the model is as follows. For $ p $ observed variables $ x_1, x_2, x_3,…, x_p , the LiNGAM model $x_i = \displaystyle \sum_{j≠i} b_{i,j} x_j + e_i \qquad i=1,2,...,p\tag{1}$$ However, $ e_i $ is an error variable and assumes independence with a non-Gaussian continuous distribution. The matrix representation of the model looks like this: ${\bf x} = {\bf Bx} + {\bf e} \tag{2}$ $ {\ bf B} $ is a coefficient matrix, a square matrix of $ p × p $. Since we assume non-circulation for the causal graph, if the order of the observed variables is rearranged in the correct order, the coefficient matrix $ {\ bf B} $ is ** a lower triangular matrix in which all diagonal components are 0 (strict lower). Triangular matrix) **.

STEP0 Generation of artificial data

This time, I will create the following artificial data and find a causal graph. Create data based on the causal graph shown in the figure above.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.decomposition import FastICA
from sklearn.linear_model import LassoLarsIC, LinearRegression
from scipy.optimize import linear_sum_assignment
from graphviz import Digraph
#STEP0 Artificial data generation
num_of_data = 1000
num_of_features = 4

_B_true = np.asarray([[0,0,0,0], [2,0,0,0], [1,3,0,0], [4,5,3,0]])

e1 = np.random.uniform(size=num_of_data)
e2 = np.random.uniform(size=num_of_data)
e3 = np.random.uniform(size=num_of_data)
e4 = np.random.uniform(size=num_of_data)

x1 = e1
x2 = _B_true[1,0] * x1 + e2
x3 = _B_true[2,0] * x1 + _B_true[2,1] * x2 + e3
x4 = _B_true[3,0] * x1 + _B_true[3,1] * x2 + _B_true[3,2] * x3 + e4

_X = np.asarray([x1, x2, x3, x4]).T
_columns = ["x1", "x2", "x3", "x4"]

#Sort observation variables into texto
order_shuffled = [3,0,2,1]
X = _X[:,order_shuffled]
columns = [_columns[i] for i in order_shuffled]
P_os = np.zeros_like(_B_true)
for i,j in enumerate(order_shuffled):
    P_os[i,j] = 1
B_true = P_os@_B_true@P_os.T

DF_X = pd.DataFrame(X,columns=columns)

After creating the exact lower triangular matrix B to be the solution, rearrange the order of the observed variables into texto. Here, the order is $ (x_4, x_1, x_3, x_2) $. This is because we do not know the correct order of observed variables in the real world.

STEP1 Estimate the reconstruction matrix W

Transforming the expression $ (2) $, we get:

\begin{align}
{\bf x} &= {\bf (I-B)}^{-1} {\bf e}  \\
&= {\bf A} e \tag{3}
\end{align}

Here, since the error variable vector $ \ bf e $ is independent and non-Gauss, this equation can be regarded as an ICA model. It becomes {W = IB} $. However, what the ICA finds may differ from the original $ \ bf W $ in line order and scale. Therefore, using the permutation matrix $ {\ bf P} $ in the row order and the diagonal matrix $ \ bf D $ indicating the scale, the restoration matrix $ {\ bf W_ {ICA}} $ estimated by ICA is

\begin{align}
{\bf W_{ICA}} &={\bf PDW}  \\
&= {\bf PD(I-B)}  \tag{4}
\end{align}

It will be. Now, let's actually ask for $ {\ bf W_ {ICA}} $ in python. However, since ICA is not the theme this time, I will ask for it quickly with FastICA in scikit-learn.

#STEP1 W_ICA(PDW)Estimate
ICA = FastICA(random_state=0)
ICA.fit(X)
W_ICA = ICA.components_

STEP2 Estimate the permutation matrix P

Transforming the expression $ (4) $, we get:

\begin{align}
{\bf P^{-1}W_{ICA}} &={\bf D(I-B)} \tag{5}
\end{align}

Here, because $ \ bf B $ has a diagonal component of $ 0 $ due to acyclicity, the diagonal component of $ \ bf {IB} $ is $ 1 $, and the diagonal component of the scale matrix $ \ bf {D} $. Is not $ 0 $, so the diagonal component of $ \ bf {D (IB)} $ is not $ 0 $. Therefore, on the left side of equation $ (5) $, $ \ bf W_ {ICA} $ must be replaced so that $ 0 $ does not come to the diagonal component. Therefore, if such a permutation matrix $ \ bf \ tilde {P} $ is used, the estimation matrix is

\begin{align}
{\bf \hat{\tilde{P}} = \mathop{\rm arg~min}\limits_{\bf \tilde{P}} \sum_{i=1}^p \frac{1}{|(\bf {\tilde{P} \hat{W}_{ICA}})_{ii}| }}  \tag{6}
\end{align}

Estimated by. Since the problem of finding such a permutation matrix is considered to be a linear assignment problem, it can be solved by the Hungarian method or the like. Well, it is an implementation, but as usual, the Hungarian method is not the theme, so it will be solved quickly with scipy.

#STEP2 P_tilde_hat estimation
epsilon = 1e-16

cost = 1 / np.abs(W_ICA +  epsilon)
row_ind, col_ind = linear_sum_assignment(cost)

P_tilde_hat = np.zeros_like(W_ICA)
for i,j in  zip(row_ind,col_ind):
    P_tilde_hat[i,j] = 1
P_tilde_hat = np.linalg.inv(P_tilde_hat)

If 0 appears when taking the reciprocal, it will explode, so we add a minute value ε.

STEP3 Estimate the scale matrix D

As usual, the diagonal component of $ \ bf {I-B} $ is $ 1 $, so the diagonal component of $ {\ bf D (I-B)} $ is the same as the diagonal matrix $ \ bf D $. Therefore, from the equation $ (5) $, the diagonal matrix $ \ bf D $ can be said to be equal to the diagonal component of $ {\ bf P ^ {-1} W_ {ICA}} $. Therefore, the estimation matrix of $ \ bf {D} $ is

\begin{align}
{\bf{\hat D} = diag(\hat{\tilde{P}} \hat{W}_{ICA})} \tag{7}
\end{align}

Written in python, it looks like this.

#STEP3 D_hat estimation
D_hat = np.diag(np.diag(P_tilde_hat@W_ICA))

STEP4 Estimate the reconstruction matrix W and the coefficient matrix B

Estimate the reconstruction matrix $ \ bf {W} $ from the equation $ (5) $

\begin{align}
{\bf \hat W = \hat D^{-1} \hat{\tilde{P}} \hat{W}_{ICA}}  \tag{8}
\end{align}

And the coefficient matrix $ \ bf {B} $ is estimated as the matrix

\begin{align}
{\bf \hat B = I - \hat W}  \tag{9}
\end{align}

It will be.

#STEP4 W_hat,B_hat estimation
W_hat = np.linalg.inv(D_hat)@P_tilde_hat@W_ICA
I = np.eye(num_of_features)
B_hat = I - W_hat

STEP5 Guessing the causal order

At this stage, the LiNGAM model formula itself represented by the formula $ (2) $ is almost obtained, but the causal direction itself for drawing the graph is unknown. Therefore, we must find a permutation in which $ \ bf {B} $ becomes a strict lower triangular matrix by changing the order of the observed variables in Eq. $ (2) $. That is, multiply the permutation matrix $ \ bf {\ ddot {P}} $ in the equation $ (2) $.

{\bf \ddot{P} x} = {\bf \ddot{P} Bx} + {\bf \ddot{P} e} \tag{10}
{\bf \ddot{P} x} = {\bf (\ddot{P} B\ddot{P}^{\mathrm{T}})\ddot{P}x} + {\bf \ddot{P} e} \tag{11}

$ \ Bf \ ddot {P} B \ ddot {P} ^ {\ mathrm {T}} $ at this time is as close as possible to the exact lower triangular matrix $ \ bf {\ ddot {P}} Look for $.

You may find such a permutation matrix by force search, but the number of permutation matrices that can be considered is $ p! $, So there is a limit. Therefore, reference [3] proposes a method to obtain this at high speed.

In that method, first replace $ p (p + 1) / 2 $ components with 0 from $ \ bf \ hat {B} $ in ascending order of absolute value. After that, the observed variables are rearranged to determine whether or not they become a lower triangular matrix, and if not, the process ends, and if not, the component of the next smaller $ \ bf \ hat {B} $ is replaced with 0 and the determination is made again.

The lower triangular matrix judgment (= guessing the causal order) method, ① First, let the line of $ \ bf \ hat {B} $ whose components are all 0 be the $ i $ line. (If it does not exist, the judgment is Flase) ② Add $ i $ to the causal order list ③ Delete the $ i $ row and $ i $ column of $ \ bf \ hat {B} $ and return to ① as a new matrix $ \ bf \ hat {B} $.

The completed causal order list is ancestor → grandchild. Now for the python implementation. Honestly, this part was too complicated to process, so I referred to the original code a lot ...

#STEP5 Causal order(causal order)Guess

pos_list = np.vstack(np.unravel_index(np.argsort(np.abs(B_hat), axis=None), B_hat.shape)).T
initial_zero_num = num_of_features * (num_of_features + 1) // 2

for i, j in pos_list[:initial_zero_num]:    #p(p+1)/Replace with 2 0
    B_hat[i, j] = 0

Mtx_lowtri = B_hat
causal_order = []
original_index = np.arange(num_of_features)

for i in range(num_of_features):    #Strict lower triangular matrix judgment
    idx_row_all_zero_list = np.where(np.sum(np.abs(Mtx_lowtri), axis=1) == 0)[0]
    if len(idx_row_all_zero_list) == 0:
        print("Not a strict lower triangular matrix")
        break
    idx_row_all_zero = idx_row_all_zero_list[0]
    causal_order.append(original_index[idx_row_all_zero])
    original_index = np.delete(original_index, idx_row_all_zero, axis=0)

    mask = np.delete(np.arange(len(Mtx_lowtri)), idx_row_all_zero, axis=0)
    Mtx_lowtri = Mtx_lowtri[mask][:, mask]

P_doubledot = np.zeros_like(B_hat)
for i,j in enumerate(causal_order):
    P_doubledot[j,i] = 1

In this data, we found a causal order that becomes a lower triangular matrix with one shot, so the code does not loop. See the original one for a proper implementation.

STEP6 Finishing, pruning

If the number of data is sufficient, the coefficient matrix derived in STEP 5 is fine, but if not, it seems that the model can be improved by using pruning as the final finish.

The book proposes to use Adaptive Lasso as one of the pruning methods. Adaptive Lasso weights the Lasso regularization term with $ \ bf w $ to make the variable selection consistent. It has been proposed that this $ \ bf w $ use the reciprocal of the coefficients estimated by linear regression. Adaptive Lasso is performed for each observed variable according to the causal order obtained in STEP5, and the final solution is obtained.

In the implementation, the coefficients are first estimated by performing linear regression, the weights are calculated from the estimated coefficients, the weights are applied to the input data of Lasso, and the estimator that is the output of Lasso is weighted.

#STEP6 Pruning
B_pred = np.zeros_like(B_hat, dtype='float64')
lr = LinearRegression()
reg = LassoLarsIC(criterion='bic')

for i in range(1,num_of_features):
    #adaptive lasso
    predictors = causal_order[:i]    #Predictive variables
    target = causal_order[i]    #Target variable
    lr.fit(X[:, predictors], X[:, target])

    weight = 1/(np.abs(lr.coef_) + epsilon)
    reg = LassoLarsIC(criterion='bic')
    reg.fit(X[:, predictors] * weight, X[:, target])

    B_pred[causal_order[i], causal_order[:i]] = reg.coef_ * weight

Visualization

Finally, the obtained model is visualized as shown in the figure below. From the left, the true graph, the graph obtained in STEP5, and the graph processed in STEP6. This time, the data is simple and we have prepared enough data, so it is working well as of STEP5. 無題.jpg

Graphviz was used for visualization. I used the following graph creation code.

def Make_Graph(columns, B):
    Graph = Digraph(format="png")
    for i, j in enumerate(columns):
        for k, l in enumerate(columns):
            if B[i, k] != 0:
                Graph.edge(l, j, color="black", label= "    " + str(B[i, k].round(3)))
    return Graph

in conclusion

As mentioned above, the beginners of causal reasoning compared the books [1] and thesis [2], cheated the code of the head family, and tried hard to read LiNGAM.

References

** [1] ** Shohei Shimizu. Statistical causal search (machine learning professional series). Kodansha. [2] Shimizu et al. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7.Oct (2006): 2003-2030. [3] Hoyer et al. New permutation algorithms for causal discovery using ICA. In Proceedings of International Conference on Independent Component Analysis and Blind Signal Separation(2006).

Recommended Posts

Understanding with mathematical formulas and Python LiNGAM (ICA version)
Try mathematical formulas using Σ with python
"Manim" that can draw animation of mathematical formulas and graphs with Python
Check version with python
Version control of Node, Ruby and Python with anyenv
Programming with Python and Tkinter
Encryption and decryption with Python
Python and hardware-Using RS232C with Python-
python with pyenv and venv
Specify python version with virtualenv
Works with Python and R
Installation procedure for Python and Ansible with a specific version
Communicate with FX-5204PS with Python and PyUSB
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python
[Python] font family and font with matplotlib
Scraping with Node, Ruby and Python
Anaconda and Python version correspondence table
[Science and technology calculation by Python] Taylor expansion, mathematical formulas, sympy
Python and DB: Understanding DBI cursors
Scraping with Python, Selenium and Chromedriver
[Code] Module and Python version output
JSON encoding and decoding with python
Hadoop introduction and MapReduce with Python
[GUI with Python] PyQt5-Drag and drop-
Reading and writing NetCDF with Python
I played with PyQt5 and Python3
Manage each Python version with Homebrew
Reading and writing CSV with Python
Multiple integrals with Python and Sympy
Coexistence of Python2 and 3 with CircleCI (1.0)
Easy modeling with Blender and Python
Sugoroku game and addition game with python
FM modulation and demodulation with Python
[Python Windows] pip install with Python version
URL match checking and extraction with python regular expression Regex Full version
AES-CBC encryption and decryption Node.js version with Python will also be added.
Data pipeline construction with Python and Luigi
Calculate and display standard weight with python
FM modulation and demodulation with Python Part 3
[Automation] Manipulate mouse and keyboard with Python
Passwordless authentication with RDS and IAM (Python)
Python installation and package management with pip
Using Python and MeCab with Azure Databricks
POST variously with Python and receive with Flask
Capturing images with Pupil, python and OpenCV
Fractal to make and play with Python
A memo with Python2.7 and Python3 on CentOS
Create and decrypt Caesar cipher with python
CentOS 6.4 with Python 2.7.3 with Apache with mod_wsgi and Django
Reading and writing JSON files with Python
Dealing with "years and months" in Python
I installed and used Numba with Python3.5
Tweet analysis with Python, Mecab and CaboCha
Linking python and JavaScript with jupyter notebook
Traffic monitoring with Kibana, ElasticSearch and Python
FM modulation and demodulation with Python Part 2
Encrypt with Ruby (Rails) and decrypt with Python
Easily download mp3 / mp4 with python and youtube-dl!