--I was looking for a way to create a random variable X that follows a standard normal distribution and a random variable Y that follows a χ2 distribution with an arbitrary correlation coefficient ρ. --If both X and Y follow the standard normal distribution, you can create them by the following method. --Use the numpy.random.multivariate_normal function. --If it is a multidimensional standard normal distribution, the variance-covariance matrix is equal to the correlation matrix, so covmtx = [[1, ρ], [ρ, 1]] --Even if it is not a standard normal distribution, if the mean is zero and the variance is the same, it can be generated by this method explained by horiem. --The X ~ standard normal distribution and the Y ~ χ2 distribution do not meet the conditions of this method, so they need to be generated by another method. --As a temporary feat, make a note of how to create two random variables with correlations in a multidimensional standard normal distribution and convert one of them to follow another probability distribution. --Y-> Cumulative distribution of standard normal distribution cdf (Y)-> Percentage point function of arbitrary distribution ppf (cdf (Y))-> Z ――When you say something like this, see here. --Note that the value of the correlation coefficient after conversion cannot be specified exactly.
sample.py
# import numpy as np
# import scipy.stats as spstats
# generate X,Y ~ multinormal(mu = [0,0], cov = [[1,ρ],[ρ,1]])
rho_norm = 0.8 # correlation coeff for multinorm
mu = [0, 0] # mean of X, Y
cov = [
[1, rho_norm],
[rho_norm, 1]
] # cov matrix
vals_norm = np.random.multivariate_normal(mu, cov, 100000)
x_norm = vals_norm[:,0]
y_norm = vals_norm[:,1]
# np.corrcoef(x_norm, y_norm) gives a rho value around rho_norm
# convert Y to Z ~ chi2(k)
k = 3 # parameter of chi2 dist
z_chi2 = spstats.chi2.ppf(spstats.norm.cdf(y_norm, loc = 0, scale = 1), df = k)
# x_norm ~ norm(mu = 0, var = 1) and z_chi2 ~ chi2(k = 3)
# np.corrcoef(x_norm, z_chi2) gives a rho value a bit smaller than rho_norm
The scatter plot of X and Y generated with a correlation coefficient of 0.8 and the marginal distribution look like this. So, for Z that has been bitten by the above transformation, the scatter plot with X and the marginal distribution look like this. It can be seen that for Z (vertical axis), the marginal distribution is transformed into a χ2 distribution with k = 3. The correlation coefficient at this time (since the correlation matrix can be obtained, the [0,1] component) is The positive correlation is maintained, but it is smaller than the correlation coefficient (0.8) specified between XY in the multidimensional standard normal distribution.
This time, Y was converted to a χ2 distribution, but it can be converted to an arbitrary distribution. Also, X can be converted in the same way.
With this method, it is not possible to specify the exact correlation coefficient after conversion. I think there is a more straightforward method, but I couldn't find it for a while, so I'll write it down here. Can anyone please tell me how to do it more easily?
11/26 postscript: There was this in the original commentary of matlab. Almost the same approach, but described in more detail. FMI (For My Info)
Recommended Posts