[PYTHON] Create a partial correlation matrix and draw an independent graph

I will introduce the procedure to draw an independent graph with graphviz

Partial correlation matrix and independent graph

There are two reasons why the correlation is observed:

--If there is a causal relationship --When there is a common factor that has a causal relationship

The partial correlation is to obtain the correlation coefficient after removing the latter effect, and the independent graph shows the factors with high partial correlation connected to each other. See below for details.

Derivation of the meaning and formula of the partial correlation coefficient https://mathtrain.jp/partialcor

1. Install graphviz

I haven't confirmed it yet, but I think it will probably be below

Install graphviz's python wrapper with pip

terminal


pip install graphviz

Install the main body of graphviz and make it available from jupyter notebook

terminal


conda install -c conda-forge python-graphviz

2. How to draw a graph

You can define a node with node () and define a concatenation with edge () as shown below. When render () is executed, the graphviz source code is exported once, and the graph is exported as png or pdf based on it. If cleanup = True, after exporting the image file, export it as png below

Undirected graph

python


from graphviz import Graph

g = Graph(format='png')

g.node('1')
g.node('2')
g.node('3')
g.edge('1', '2')
g.edge('2', '3')
g.edge('3', '1')

g.render(filename='../test', format='png', cleanup=True, directory=None)
display(Image.open('../test.png'))

ダウンロード (2).png

Directed graph

python


from graphviz import Digraph

dg = Digraph(format='png')

dg.node('1')
dg.node('2')
dg.node('3')
dg.edge('1', '2')  # 1 -> 2
dg.edge('2', '3')  # 2 -> 3
dg.edge('3', '1')  # 3 -> 1

dg.render(filename='../test', format='png', cleanup=True, directory=None)
display(Image.open('../test.png'))

ダウンロード (3).png

3. Data preparation

This time I will use iris as sample data

python


import numpy as np
import pandas as pd
from sklearn import datasets
import seaborn as sns

iris = datasets.load_iris()
df = pd.DataFrame(np.hstack([iris.data, iris.target.reshape(-1, 1)]), 
                  columns=iris.feature_names + ['label'])
sns.pairplot(df, hue='label')

ダウンロード (8).png

4. Creating a correlation matrix

python


import matplotlib.pyplot as plt

cm = pd.DataFrame(np.corrcoef(df.T), columns=df.columns, index=df.columns)

sns.heatmap(cm, annot=True, square=True, vmin=-1, vmax=1, fmt=".2f", cmap="RdBu")
plt.savefig("pcor.png ")
plt.show()

ダウンロード (4).png

5. Creating a partial correlation matrix

I borrowed this code. Hatena Blog Hashikure Engineer Mocking notes

There seems to be a way to test it a little more carefully and not subtract the correlation that is not significant, but here it is a uniform subtraction.

python


import scipy

def cor2pcor(R):
    inv_cor = scipy.linalg.inv(R)
    rows = inv_cor.shape[0]
    regu_1 = 1 / np.sqrt(np.diag(inv_cor))
    regu_2 = np.repeat(regu_1, rows).reshape(rows, rows)
    pcor = (-inv_cor) * regu_1 * regu_2
    np.fill_diagonal(pcor, 1)
    return pcor

pcor = pd.DataFrame(cor2pcor(cm), columns=cm.columns, index=cm.index)

sns.heatmap(pcor, annot=True, square=True, vmin=-1, vmax=1, fmt=".2f", cmap="RdBu")
plt.savefig("pcor.png ")
plt.show()

ダウンロード (5).png

6. Draw a graph

Draw an undirected graph by concatenating places where the absolute value of the correlation coefficient is larger than the appropriately set threshold.

python


from graphviz import Graph
from PIL import Image

def draw_graph(cm, threshold):
    edges = np.where(np.abs(cm) > threshold)
    edges = [[cm.index[i], cm.index[j]] for i, j in zip(edges[0], edges[1]) if i > j]

    g = Graph(format='png')
    for k in range(cm.shape[0]):
        g.node(cm.index[k])

    for i, j in edges:
        g.edge(j, i)

    g.render(filename='../test', format='png', cleanup=True, directory=None)
    display(Image.open('../test.png'))

threshold = 0.3
draw_graph(cm, threshold)
draw_graph(pcor, threshold)

Graph made from correlation matrix

ダウンロード (6).png

Graph made from partial correlation matrix

ダウンロード (7).png

Summary

Since the correlation coefficient is low, it seems a little difficult to conclude with this alone, but if this is correct, the length and width of the calyx only correlate with the length and width of the petals, not directly with the type of iris. It seems like a thing. It is better to make a graph rather than looking at the correlation matrix so that the image is easier to understand.

Let's try

Recommended Posts

Create a partial correlation matrix and draw an independent graph
Create an Ax generator and draw an infinite graph
[Python] How to create Correlation Matrix and Heat Map
Draw a graph with NetworkX
Draw a graph with networkx
Draw a graph with Julia + PyQtGraph (2)
Draw a scatterplot matrix in python
Draw a loose graph with matplotlib
Draw a graph with Julia + PyQtGraph (1)
Draw a graph with Julia + PyQtGraph (3)
Draw a graph with pandas + XlsxWriter
Draw a graph with PySimple GUI
[Python] Create a linebot to write a name and age on an image
Draw a graph that can be moved around with HoloViews and Bokeh
Create a Python3 environment with pyenv on Mac and display a NetworkX graph
How to draw a graph using Matplotlib
Create a graph using the Sympy module
Simply draw a graph by specifying a file
Create a matrix with PythonGUI (text box)
Draw a graph with PyQtGraph Part 1-Drawing
Create a graph with borders removed with matplotlib
[pyqtgraph] Understand SignalProxy and create a crosshair that follows the cursor on the graph
Create a stacked graph corresponding to both positive and negative directions with matplotlib
Create a filter to get an Access Token in the Graph API (Flask)