[PYTHON] Cholesky decomposition was behind the random number generation according to the multivariate normal distribution

The Choresky decomposition of the covariance matrix can convert random numbers that follow an independent standard normal distribution to random numbers that follow a multivariate normal distribution. What is it?

1. Mathematical background

--Change of variables of random variables --Derivation of multidimensional normal distribution --Characteristics of covariance matrix --Cholesky decomposition algorithm

To explain.

1.1. Change of variables of multidimensional probability distribution

Random variable

    \begin{eqnarray}
        X_1,\ X_2,\  \cdots ,\  X_n 
\end{eqnarray}

, Random variables

\begin{eqnarray}
        Y_1,\ Y_2,\  \cdots ,\  Y_n 
    \end{eqnarray}

Convert to. However, $ X $ and $ Y $ are respectively

\begin{eqnarray}
        &f_X&\left(x_1,\ x_2,\ \cdots ,\ x_n\right)\\
        &f_Y&\left(y_1,\ y_2,\ \cdots ,\ y_n\right)
\end{eqnarray}

Suppose it has a probability density function of.

Each $ Y $ as a bijection of $ X $

    \begin{eqnarray}
        \begin{cases}
            Y_1 = h_1\left(X_1,\ X_2,\  \cdots ,\  X_n\right)&\\
            Y_2 = h_2\left(X_1,\ X_2,\  \cdots ,\  X_n\right)&\\
            \ \vdots &\\
            Y_n = h_n\left(X_1,\ X_2,\  \cdots ,\  X_n\right)
        \end{cases}
    \end{eqnarray}

It is expressed as. Since each $ h_i $ is bijective, there is an inverse function,

    \begin{eqnarray}
        \begin{cases}
            X_1 = g_1\left(Y_1,\ Y_2,\  \cdots ,\  Y_n\right)&\\
            X_2 = g_2\left(Y_1,\ Y_2,\  \cdots ,\  Y_n\right)&\\
            \ \vdots &\\
            X_n = g_n\left(Y_1,\ Y_2,\  \cdots ,\  Y_n\right)
        \end{cases}
    \end{eqnarray}

It is expressed as. Change of variables of integrals to any subset $ D $ of $ n $ dimensional real space

    \begin{eqnarray}
        &\int\cdots\int_D& f_X\left(x_1,x_2,\cdots ,x_n\right) dx_1dx_2\cdots dx_n \nonumber \\
        = &\int\cdots\int_D& f_X\left(y_1,y_2,\cdots ,y_n\right)\left|\frac{\partial\left( x_1,x_2,\cdots x_n\right) }{\partial\left(y_1,y_2,\cdots y_n\right) }\right| dy_1dy_2\cdots dy_n 
    \end{eqnarray}

Is established. However, Jacobian,

    \begin{eqnarray}
        \left|\frac{\partial\left( x_1,x_2,\cdots x_n\right) }{\partial\left(y_1,y_2,\cdots y_n\right) }\right| =
        \begin{pmatrix}
            \frac{\partial g_1}{\partial y_1}&\frac{\partial g_1}{\partial y_2}&\cdots&\frac{\partial g_1}{\partial y_n}\\
            \frac{\partial g_2}{\partial y_1}&\ddots&&\vdots\\
            \vdots &&\ddots&\vdots\\
            \frac{\partial g_n}{\partial y_1}&\cdots&\cdots&\frac{\partial g_n}{\partial y_n}
        \end{pmatrix}
    \end{eqnarray}

It is written as.

From here, the conversion formula of the probability density function

    \begin{eqnarray}
        f_Y\left(y_1,y_2,\cdots ,y_n\right)
        = f_X\left(y_1,y_2,\cdots ,y_n\right)\left|\frac{\partial\left( x_1,x_2,\cdots x_n\right) }{\partial\left(y_1,y_2,\cdots y_n\right) }\right|
        
    \end{eqnarray}

To get.

1.2. Derivation of multidimensional normal distribution

The $ 1 $ dimensional standard normal distribution is written as $ N \ left (0,1 \ right) $.

$ N $ standard normal distribution independent of each other

    \begin{eqnarray}
        Z_1,\ Z_2,\cdots ,\ Z_n \sim N\left(0,1\right)\ \textrm{i.i.d}
    \end{eqnarray}

To determine. These simultaneous density functions are expressed as products because they are independently distributed.

    \begin{eqnarray}
        f_Z\left(z\right) = \frac{1}{\left(2\pi\right)^{n/2}} \exp\left(-\frac{1}{2}z^Tz\right)
    \end{eqnarray}

Will be.

here,

    \begin{eqnarray}
        X &=& AZ + \mu
\end{eqnarray}

Change the variables of. However, $ Z and X $ are $ n $ dimensional random variable vectors, $ A $ is a $ n \ times n $ matrix, and $ \ mu $ is a $ n $ dimensional vector.

\begin{eqnarray}
        X&=&\left(X_1,\ X_2,\ \cdots\ X_n\right)^{\mathrm{T}} \nonumber \\
        Z&=&\left(Z_1,\ Z_2,\ \cdots\ Z_n\right)^{\mathrm{T}} \nonumber \\

        A&=&
        \begin{pmatrix}
            a_{1,1}&a_{1,2}&\cdots&a_{1,n}\\
            a_{2,2}&\ddots&&\vdots\\
            \vdots&&\ddots&\vdots\\
            a_{n,1}&\cdots&\cdots&a_{n,n}
        \end{pmatrix} \nonumber \\

        \mu&=&\left(\mu_1,\ \mu_2,\ \cdots\ \mu_n\right)^{\mathrm{T}} \nonumber
    \end{eqnarray}

Jacobian of change of variables

    \begin{eqnarray}
        \left|\frac{\partial\left( x_1,x_2,\cdots x_n\right) }{\partial\left(z_1,z_2,\cdots z_n\right) }\right| &=& \det\left( A\right)
    \end{eqnarray}

Than,

\begin{eqnarray}
        \left|\frac{\partial\left( z_1,z_2,\cdots z_n\right) }{\partial\left(x_1,x_2,\cdots x_n\right) }\right| &=& \frac{1}{\det\left( A\right)}

    \end{eqnarray}

To get.

Therefore, the simultaneous density function of $ X $ is

    \begin{eqnarray}
        f_X\left(x\right)&=&f_Z\left(A^{\mathrm{T}}\left(x-\mu\right)\right)\frac{1}{\det\left(A\right)} \nonumber \\
        &=&\frac{1}{\left(2\pi\right)^{n/2}\det\left(A\right)}\exp\left\{-\frac{1}{2}\left(x-\mu\right)^{\mathrm{T}}AA^{\mathrm{T}}\left(x-\mu\right)\right\}
    \end{eqnarray}

Will be.

The average vector of the transformed random variables and the variance-covariance matrix are

    \begin{eqnarray}
        E\left[X\right]&=&AE\left[Z\right]+\mu \nonumber \\
        &=&\mu 
         \\
        Cov\left(X\right)&=&E\left[\left(X-\mu\right)\left(X-\mu\right)^{\mathrm{T}}\right] \nonumber \\
        &=&E\left[AZZ^{\mathrm{T}}A^{\mathrm{T}}\right] \nonumber \\
        &=&AE\left[Z\right] E\left[Z^{\mathrm{T}}\right] A^{\mathrm{T}} \nonumber \\
        &=&AA^{\mathrm{T}}
    \end{eqnarray}

The covariance matrix,

    \begin{eqnarray}
        \Sigma = AA^{\mathrm{T}}
    \end{eqnarray}

Notated as.

Applying this, the simultaneous density function of the multidimensional normal distribution

    \begin{eqnarray}
        f_X\left(x\right)
        &=&\frac{1}{\left(2\pi\right)^{n/2}\det\left(A\right)}\exp\left\{-\frac{1}{2}\left(x-\mu\right)^{\mathrm{T}}\left(AA^{\mathrm{T}}\right)^{-1}\left(x-\mu\right)\right\} \nonumber \\
        &=&\frac{1}{\left(2\pi\right)^{n/2}\det\left(\Sigma\right)^{1/2}}\exp\left\{-\frac{1}{2}\left(x-\mu\right)^{\mathrm{T}}\Sigma^{-1}\left(x-\mu\right)\right\}
    \end{eqnarray}

To get.

1.3. Properties of the covariance matrix

In generating a random number sequence from the variance-covariance matrix by Cholesky decomposition, a constraint condition that the variance-covariance matrix is a definite value is required.

Here, we generally confirm the properties of the variance-covariance matrix and confirm the Cholesky decomposition possibility.

Here, we confirm our knowledge of linear algebra.

Semi-deterministicity of matrix

The following are equivalent. -$ n \ times n $ Symmetric matrix $ A $ is a semi-fixed value --For any vector $ x $, $ x ^ \ mathrm {T} Ax \ geq 0 $ -Any major submatrix expression of $ A $ is non-negative --All eigenvalues are non-negative

Definiteness of matrix

The following are equivalent. -$ n \ times n $ Symmetric matrix $ A $ is a definite value --For any vector $ x $, $ x ^ \ mathrm {T} Ax> 0 $ -Any major submatrix expression of $ A $ is positive --All eigenvalues are positive

Prove the following allegations. ** The variance-covariance matrix $ \ Sigma $ is a semi-regular definite matrix. ** **

Let $ n \ times n $ variance-covariance matrix with variance $ \ sigma_i $ and covariance $ \ rho_ {i, j} $ as components be $ \ Sigma $. From $ \ rho_ {i, j} = \ rho_ {j, i} $, $ \ Sigma $ is a symmetric matrix. Let $ a $ be an arbitrary $ n $ dimensional constant vector.

\begin{eqnarray} a^{\mathrm{T}}\Sigma a &=& a^{\mathrm{T}}E\left[\left(X-E\left[X\right]\right)\left(X-E\left[X\right]\right)^{\mathrm{T}}\right]a \nonumber \ &=&E\left[a^{\mathrm{T}}\left(X-E\left[X\right]\right)\left(X-E\left[X\right]\right)^{\mathrm{T}}a\right] \nonumber \ &=&E\left[\langle a,X-E\left[X\right]\rangle^2\right] \nonumber \ &\geq&0 \nonumber \end{eqnarray}


 > Therefore, $ \ Sigma $ is a definite matrix.

 In practice, the variance-covariance matrix is expected to be definite-matrix symmetry in many cases,
 Note that the following random numbers cannot be generated if the definite value is not symmetric.

## 1.4. Cholesky decomposition
 Cholesky decomposition is the decomposition of the definite matrix $ A $ using the lower triangular matrix $ L $.

```math
    \begin{eqnarray}
        A&=&LL^{\mathrm{T}}
        \\
        \nonumber \\
        A&=&
        \begin{pmatrix}
            a_{1,1}&a_{1,2}&\cdots&a_{1,n}\\
            a_{2,2}&\ddots&&\vdots\\
            \vdots&&\ddots&\vdots\\
            a_{n,1}&\cdots&\cdots&a_{n,n}
        \end{pmatrix} \nonumber \\
        L&=&
        \begin{pmatrix}
            l_{1,1}&0&\cdots&\cdots&0\\
            l_{2,1}&l_{2,2}&0&&\vdots\\
            \vdots&&\ddots&\ddots&\vdots\\
            \vdots&&&l_{n-1,n-1}&0\\
            l_{n,1}&\cdots&\cdots&l_{n,n-1}&l_{n,n}
        \end{pmatrix} \nonumber
    \end{eqnarray}

The construction method of Cholesky decomposition is as follows.

    \begin{eqnarray}
            l_{i,i}&=&\sqrt{a_{i,i}-\sum_{k=1}^{i-1}l_{i,k}^2}\ \ \ \ \ \ \left(\textrm{for}\ i=1,\cdots ,n\right) \\
            l_{j,i}&=&\left(a_{j,i}-\sum_{k=1}^{i-1}l_{j,k}\ l_{i,k}\right)/l_{i,i}\ \ \ \left(\textrm{for}\ j=i+1,\cdots ,n\right) 
    \end{eqnarray}

1.5. Generation of random number sequence according to multivariate normal distribution by Cholesky decomposition

A definite matrix is set for the variance-covariance matrix $ \ Sigma $ whose components are the variance $ \ sigma_i $ and the covariance $ \ rho_ {i, j} $.

Apply the Cholesky decomposition to $ \ Sigma $ to get $ A = L $ (lower triangular matrix).

    \begin{eqnarray}
        x&=&Lz+\mu \\
        \nonumber \\
        L&=&
        \begin{pmatrix}
            l_{1,1}&0&\cdots&\cdots&0\\
            l_{2,1}&l_{2,2}&0&&\vdots\\
            \vdots&&\ddots&\ddots&\vdots\\
            \vdots&&&l_{n-1,n-1}&0\\
            l_{n,1}&\cdots&\cdots&l_{n,n-1}&l_{n,n}
        \end{pmatrix} \nonumber \\
        l_{i,i}&=&\sqrt{\sigma_{i}-\sum_{k=1}^{i-1}l_{i,k}^2}\ \ \ \ \ \ \left(\textrm{for}\ i=1,\cdots ,n\right) \nonumber \\
        l_{j,i}&=&\left(\rho_{j,i}-\sum_{k=1}^{i-1}l_{j,k}\ l_{i,k}\right)/l_{i,i}\ \ \ \left(\textrm{for}\ j=i+1,\cdots ,n\right) \nonumber
    \end{eqnarray}

2. Implementation

multivariate_normal_distribution.py


import numpy as np

#average
mu = np.array([[2],
               [3]])

#Covariance matrix
cov = np.array([[1,0.2],
                 0.2,1]])

L = np.linalg.cholesky(cov)
dim = len(cov)

random_list=[]

for i in range(n):
    z = np.random.randn(dim,1)
    random_list.append((np.dot(L,z)+mu).reshape(1,-1).tolist()[0])

I played around with it. norm_dist1.png

norm_dist2.png

norm_dist3.png

norm_dist4.png

(The numbers represent the elements of the covariance matrix.) plot_num.png

Recommended Posts

Cholesky decomposition was behind the random number generation according to the multivariate normal distribution
Random number generation according to multivariate normal distribution using Cholesky decomposition of covariance matrix
Derivation of multivariate t distribution and implementation of random number generation by python
Random number generator with normal distribution N (0,1)
Plot and understand the multivariate normal distribution in Python
Steps to calculate the likelihood of a normal distribution