The point

--I was looking for a way to create a random variable X that follows a standard normal distribution and a random variable Y that follows a χ2 distribution with an arbitrary correlation coefficient ρ. --If both X and Y follow the standard normal distribution, you can create them by the following method. --Use the numpy.random.multivariate_normal function. --If it is a multidimensional standard normal distribution, the variance-covariance matrix is equal to the correlation matrix, so covmtx = [[1, ρ], [ρ, 1]] --Even if it is not a standard normal distribution, if the mean is zero and the variance is the same, it can be generated by this method explained by horiem. --The X ~ standard normal distribution and the Y ~ χ2 distribution do not meet the conditions of this method, so they need to be generated by another method. --As a temporary feat, make a note of how to create two random variables with correlations in a multidimensional standard normal distribution and convert one of them to follow another probability distribution. --Y-> Cumulative distribution of standard normal distribution cdf (Y)-> Percentage point function of arbitrary distribution ppf (cdf (Y))-> Z ――When you say something like this, see here. --Note that the value of the correlation coefficient after conversion cannot be specified exactly.

code

`sample.py`


# import numpy as np
# import scipy.stats as spstats

# generate X,Y ~ multinormal(mu = [0,0], cov = [[1,ρ],[ρ,1]])
rho_norm = 0.8 # correlation coeff for multinorm
mu = [0, 0] # mean of X, Y
cov = [
         [1, rho_norm], 
         [rho_norm, 1]
      ] # cov matrix

vals_norm = np.random.multivariate_normal(mu, cov, 100000)
x_norm = vals_norm[:,0]
y_norm = vals_norm[:,1]
# np.corrcoef(x_norm, y_norm) gives a rho value around rho_norm

# convert Y to Z ~ chi2(k)
k = 3 # parameter of chi2 dist
z_chi2 = spstats.chi2.ppf(spstats.norm.cdf(y_norm, loc = 0, scale = 1), df = k)

# x_norm ~ norm(mu = 0, var = 1) and z_chi2 ~ chi2(k = 3)
# np.corrcoef(x_norm, z_chi2) gives a rho value a bit smaller than rho_norm

result

The scatter plot of X and Y generated with a correlation coefficient of 0.8 and the marginal distribution look like this. So, for Z that has been bitten by the above transformation, the scatter plot with X and the marginal distribution look like this. It can be seen that for Z (vertical axis), the marginal distribution is transformed into a χ2 distribution with k = 3. The correlation coefficient at this time (since the correlation matrix can be obtained, the [0,1] component) is The positive correlation is maintained, but it is smaller than the correlation coefficient (0.8) specified between XY in the multidimensional standard normal distribution.

This time, Y was converted to a χ2 distribution, but it can be converted to an arbitrary distribution. Also, X can be converted in the same way.

important point

With this method, it is not possible to specify the exact correlation coefficient after conversion. I think there is a more straightforward method, but I couldn't find it for a while, so I'll write it down here. Can anyone please tell me how to do it more easily?

11/26 postscript: There was this in the original commentary of matlab. Almost the same approach, but described in more detail. FMI (For My Info)

[PYTHON] Memo on how to make (approximate) two random variables, each of which follows an arbitrary distribution

The point

code

sample.py

result

important point

`sample.py`