I want to calculate the mutual information $ I (X; Y) $ of continuous random variables $ X $ and $ Y $ in Python.
import numpy
def mutual_information(X, Y, bins=10):
#Joint probability distribution p(x,y)Calculation
p_xy, xedges, yedges = np.histogram2d(X, Y, bins=bins, density=True)
# p(x)p(y)Calculation
p_x, _ = np.histogram(X, bins=xedges, density=True)
p_y, _ = np.histogram(Y, bins=yedges, density=True)
p_x_y = p_x[:, np.newaxis] * p_y
#dx and dy
dx = xedges[1] - xedges[0]
dy = yedges[1] - yedges[0]
#Integral element
elem = p_xy * np.ma.log(p_xy / p_x_y)
#Mutual information and p(x, y), p(x)p(y)Output
return np.sum(elem * dx * dy), p_xy, p_x_y
If you want to calculate the mutual information for the time being, you can use the above function. Incidentally, I will leave some important points for implementation.
Density of np.histogram2dWhen I vaguely thought that the probability would be returned if density = True was set,np.sum (p_xy)did not become 1 and I was a little impatient.
The point to note is that p_xy is ** probability density **, not probability.
Since $ X $ and $ Y $ are continuous variables, the histogram approximates the probability density. Therefore, if you add them together considering the width of the bin, it will be 1.
np.histogram and np.histogram2d return the probability density and bins (edges in the code).
It is necessary to calculate dx and dy from this bin.
import numpy as np
N = 1000
X = np.random.normal(loc=0, scale=1, size=N)
p_x, edges = np.histogram(X, bins=10, density=True)
#If you take the sum of the probability densities without thinking, it will not be 1 as a matter of course.
print(np.sum(p_x)) #Output example: 1.580769264599771
#If you take the sum in consideration of the bin width, it becomes 1.
dx = edges[1] - edges[0]
print(np.sum(p_x * dx)) #Output example: 1.0000000000000002
p_x_yP_x_y in the code is trying to calculate $ p (x) p (y) $.
Actually, I calculated it with the following code first, and it didn't work.
p_x_y = p_x * p_y
Correctly
p_x_y = p_x[:, np.newaxis] * p_y
is. In the former, p_x_y is the primary array, and in the latter, p_x_y is the secondary array.
Since they are not independent, there is a difference between $ p (x, y) $ and $ p (x) p (y) $, and the mutual information increases.
import matplotlib.pyplot as plt
#sine wave and cos wave
t = np.linspace(-5, 5, num=1000)
X = np.sin(2 * np.pi * t)
Y = np.cos(3 * np.pi * t)
#Calculation of mutual information
mi, p_xy, p_x_y = mutual_information(X, Y, bins=30)
#Result output
plt.figure(dpi=100)
ax1 = plt.subplot(121)
ax2 = plt.subplot(122)
ax1.set_title(r'$P_{XY}(x, y)$')
ax1.imshow(p_xy)
ax2.set_title(r'$P_{X}(x) P_{Y}(y)$')
ax2.imshow(p_x_y)
plt.suptitle('MI = {}'.format(mi))
plt.show()

When the two variables are independent, $ p (x, y) $ and $ p (x) p (y) $ match, and the mutual information becomes small.
import matplotlib.pyplot as plt
#Two independent normal distributions
N = 10000
X = np.random.normal(size=N)
Y = np.random.normal(size=N)
#Calculation of mutual information
mi, p_xy, p_x_y = mutual_information(X, Y, bins=30)
Execution example
#Result output
plt.figure(dpi=100)
ax1 = plt.subplot(121)
ax2 = plt.subplot(122)
ax1.set_title(r'$P_{XY}(x, y)$')
ax1.imshow(p_xy)
ax2.set_title(r'$P_{X}(x) P_{Y}(y)$')
ax2.imshow(p_x_y)
plt.suptitle('MI = {}'.format(mi))
plt.show()

Recommended Posts