[PYTHON] Memo on how to make (approximate) two random variables, each of which follows an arbitrary distribution

The point

--I was looking for a way to create a random variable X that follows a standard normal distribution and a random variable Y that follows a χ2 distribution with an arbitrary correlation coefficient ρ. --If both X and Y follow the standard normal distribution, you can create them by the following method. --Use the numpy.random.multivariate_normal function. --If it is a multidimensional standard normal distribution, the variance-covariance matrix is equal to the correlation matrix, so covmtx = [[1, ρ], [ρ, 1]] --Even if it is not a standard normal distribution, if the mean is zero and the variance is the same, it can be generated by this method explained by horiem. --The X ~ standard normal distribution and the Y ~ χ2 distribution do not meet the conditions of this method, so they need to be generated by another method. --As a temporary feat, make a note of how to create two random variables with correlations in a multidimensional standard normal distribution and convert one of them to follow another probability distribution. --Y-> Cumulative distribution of standard normal distribution cdf (Y)-> Percentage point function of arbitrary distribution ppf (cdf (Y))-> Z ――When you say something like this, see here. --Note that the value of the correlation coefficient after conversion cannot be specified exactly.

code

sample.py


# import numpy as np
# import scipy.stats as spstats

# generate X,Y ~ multinormal(mu = [0,0], cov = [[1,ρ],[ρ,1]])
rho_norm = 0.8 # correlation coeff for multinorm
mu = [0, 0] # mean of X, Y
cov = [
         [1, rho_norm], 
         [rho_norm, 1]
      ] # cov matrix

vals_norm = np.random.multivariate_normal(mu, cov, 100000)
x_norm = vals_norm[:,0]
y_norm = vals_norm[:,1]
# np.corrcoef(x_norm, y_norm) gives a rho value around rho_norm

# convert Y to Z ~ chi2(k)
k = 3 # parameter of chi2 dist
z_chi2 = spstats.chi2.ppf(spstats.norm.cdf(y_norm, loc = 0, scale = 1), df = k)

# x_norm ~ norm(mu = 0, var = 1) and z_chi2 ~ chi2(k = 3)
# np.corrcoef(x_norm, z_chi2) gives a rho value a bit smaller than rho_norm

result

The scatter plot of X and Y generated with a correlation coefficient of 0.8 and the marginal distribution look like this. image.png So, for Z that has been bitten by the above transformation, the scatter plot with X and the marginal distribution look like this. image.png It can be seen that for Z (vertical axis), the marginal distribution is transformed into a χ2 distribution with k = 3. The correlation coefficient at this time (since the correlation matrix can be obtained, the [0,1] component) is image.png The positive correlation is maintained, but it is smaller than the correlation coefficient (0.8) specified between XY in the multidimensional standard normal distribution.

This time, Y was converted to a χ2 distribution, but it can be converted to an arbitrary distribution. Also, X can be converted in the same way.

important point

With this method, it is not possible to specify the exact correlation coefficient after conversion. I think there is a more straightforward method, but I couldn't find it for a while, so I'll write it down here. Can anyone please tell me how to do it more easily?

11/26 postscript: There was this in the original commentary of matlab. Almost the same approach, but described in more detail. FMI (For My Info)

Recommended Posts

Memo on how to make (approximate) two random variables, each of which follows an arbitrary distribution
[Introduction to Data Scientists] Basics of Probability and Statistics ♬ Probability / Random Variables and Probability Distribution
How to run Cython on OSX Memo
How to build an environment for using multiple versions of Python on Mac