Implemented label propagation method in Python

Preface

A technique for estimating the label of a node on a graph.

lp.py was implemented by referring to here. Or rather, almost as it is.

lp_original.py was implemented by referring to Chapter 2 of Paper.

In lp_original.py, we also implemented a method to calculate the iterative and iterative matrix product without solving the simultaneous equations (lp_iter).

Implementation

`lp.py`


# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve

def load_adjucency(filename,n):
    W = lil_matrix((n,n))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        src = int(entries[0])
        dst = int(entries[1])
        W[src,dst] = 1
    return W

def load_labels(filename,n):
    y = lil_matrix((1,n))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        nid = int(entries[0])
        label = int(entries[1])
        y[0,nid] = label
    return y

n = int(sys.argv[1])
W = load_adjucency(sys.argv[2],n)
y = load_labels(sys.argv[3],n)

D = dia_matrix((W.sum(0),[0]), W.shape)
L = D - W

I = identity(W.shape[0])

lam = 1
f = spsolve((I + lam * L), y)
print f

`lp_original.py`


# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve
from sklearn.preprocessing import normalize

def load_matrix(filename,n,m):
    W = lil_matrix((n+m,n+m))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        src = int(entries[0])
        dst = int(entries[1])
        W[src,dst] = 1
    W = normalize(W,norm='l1',axis=1)
    Wul = W[n:n+m,0:n]
    Wuu = W[n:n+m,n:n+m]
    return (Wuu,Wul)

def load_labels(filename,n):
    y = lil_matrix((n,1))
    i = 0
    for line in open(filename, 'r'):
        label = int(line.rstrip())
        y[i,0] = label
        i += 1
    return y

def lp_solve(Wuu,Wul,y):
    return spsolve((identity(m) + Wuu), Wul*y)

def lp_iter(Wuu,Wul,y,t=10):
    i = 0
    f = lil_matrix((Wuu.shape[0],1))
    while True:
        f = Wuu * f + Wul * y
        if i > t:
            break
        i += 1
    return f

""" # of labeled nodes """
n = int(sys.argv[1])
""" # of unlabeled nodes """
m = int(sys.argv[2])
""" propagation matrix (row normalized) """
Wuu,Wul = load_matrix(sys.argv[3],n,m)
""" labels of labeled nodes """
y = load_labels(sys.argv[4],n)


print lp_solve(Wuu,Wul,y)
print '============='
print lp_iter(Wuu,Wul,y)

How to use

lp.py is executed by passing the number of nodes, the file path of the edge list, and the file path of the node label in this order. The data used is the same as the blog I referred to.

`lp.Execution example of py`


$ cat edge_list
0 2
2 0
0 4
4 0
0 9
9 0
1 2
2 1
2 3
3 2
3 4
4 3
3 6
6 3
5 9
9 5
6 7
7 6
6 8
8 6
7 9
9 7
$ cat labels
0 1
2 1
4 1
5 -1
6 -1
7 -1
$ python lp.py 10 edge_list labels
[ 0.45966011  0.23023256  0.46046512  0.1519678   0.5372093  -0.57951699
 -0.38980322 -0.51627907 -0.19490161 -0.15903399]

lp_original.py is executed by passing the number of labeled nodes n, the number of unlabeled nodes m, the file path of the edge list, and the file path of the edge list in this order.

However, the node numbers must be from 0 to n-1 to labeled nodes and from n to n + m to unlabeled nodes.

In this example, the correspondence between the node number passed to lp.py and the node number passed to lp_original.py is as follows.

lp.py [0,1,2,3,4,5,6,7,8,9] lp_original.py [0,6,1,7,2,3,4,5,8,9]

Therefore, each line of the label file contains the labels of the labeled nodes from 0 to n.

With this implementation, you only need to use the edge from the unlabeled node to the labeled node and the edge from the unlabeled node to the unlabeled node.

`lp_original.Execution example of py`


$ cat edge_list_for_original
9 0
6 1
7 1
7 2
7 4
9 3
8 4
9 5
$ cat labels_for_original
1
1
1
-1
-1
-1
$ python lp_original.py 6 4 edge_list_for_original labels_for_original
[ 1.          0.33333333 -1.         -0.33333333]
=============
  (0, 0)	1.0
  (1, 0)	0.333333333333
  (2, 0)	-1.0
  (3, 0)	-0.333333333333

The result is different from lp.py, but the tendency is the same.