Implemented SimRank in Python

Preface

SimRank is a method of calculating (recursively) the similarity between nodes in a graph.

It has the feature that the similarity can be calculated without having a common adjacent node.

Papers Easy-to-understand explanation in Japanese Wikipedia

By the way, the second author of the SimRank paper is a super-famous DB researcher.

Implementation

In the paper, there was a method to calculate SimRank for bipartite graph, so I implemented it as well.

simrank.py


import numpy as np

def simrank(G,C,n,t=10):
    S = np.identity(n)
    I = np.identity(n)
    G = normalize(G)
    i = 1
    while True:
        S = C * np.dot(np.dot(G.T,S),G) + (1-C) * I
        for j in range(n):
            S[j][j] = 1
        if i >= t:
            break
        i += 1
    return S

def bipertite_simrank(G,C,n,m,diag_1=True,t=10):
    S1 = np.identity(n)
    S2 = np.identity(m)
    I1 = np.identity(n)
    I2 = np.identity(m)
    G_T = normalize(G.T)
    G = normalize(G)
    i = 1
    while True:
        S2 = C * np.dot(np.dot(G.T,S1),G) + (1-C) * I2
        for j in range(m):
            S2[j][j] = 1
        S1 = C * np.dot(np.dot(G_T.T,S2),G_T) + (1-C) * I1
        for j in range(n):
            S1[j][j] = 1
        if i >= t:
            break
        i += 1
    return (S1,S2)

def normalize(G):
    s = G.sum(0)
    return G/s

if __name__ == '__main__':
    C = 0.8

    """ univ """
    n = 5
    G = np.zeros((n,n))
    G[0][1] = 1
    G[0][2] = 1
    G[1][3] = 1
    G[2][4] = 1
    G[3][0] = 1
    G[4][2] = 1

    S = simrank(G,C,n)
    print S

    print '=================='

    """ cake """
    n = 2
    m = 4
    G = np.zeros((n,m))
    G[0][0] = 1
    G[0][1] = 1
    G[0][2] = 1
    G[1][1] = 1
    G[1][2] = 1
    G[1][3] = 1

    S1, S2 = bipertite_simrank(G,C,n,m)
    print S1
    print S2

As a matter of fact, just calculate and update the matrix product.

The value of C is said to be about 0.8 in the paper, but it seems that about 0.6 is good in subsequent research (source is Wikipedia).

Also, the number of iterations is enough in the paper, but it seems that it is better to do more in subsequent research (source is Wikipedia).

result

>> python simrank.py
[[ 1.          0.          0.1321943   0.          0.032768  ]
 [ 0.          1.          0.4131072   0.          0.10575544]
 [ 0.1321943   0.4131072   1.          0.04096     0.08687124]
 [ 0.          0.          0.04096     1.          0.33048576]
 [ 0.032768    0.10575544  0.08687124  0.33048576  1.        ]]
==================
[[ 1.          0.54658196]
 [ 0.54658196  1.        ]]
[[ 1.          0.61863088  0.61863088  0.43726175]
 [ 0.61863088  1.          0.61863088  0.61863088]
 [ 0.61863088  0.61863088  1.          0.61863088]
 [ 0.43726175  0.61863088  0.61863088  1.        ]]

It matches the results in the paper (Figure 1, Figure 2), so I'm sure it matches!

Recommended Posts

Implemented SimRank in Python
Implemented Shiritori in Python
Sudoku solver implemented in Python 3
6 Ball puzzle implemented in python
Implemented image segmentation in python (Union-Find)
Widrow-Hoff learning rules implemented in Python
Implemented label propagation method in Python
Implemented Perceptron learning rules in Python
Implemented in 1 minute! LINE Notify in Python
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
A simple HTTP client implemented in Python
Implemented in Python PRML Chapter 7 Nonlinear SVM
I implemented Cousera's logistic regression in Python
Implemented in Python PRML Chapter 5 Neural Networks
Implemented Stooge sort in Python3 (Bubble sort & Quicksort)
Implemented in Python PRML Chapter 1 Bayesian Inference
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python