A technique for estimating the label of a node on a graph.
lp.py
was implemented by referring to here. Or rather, almost as it is.
lp_original.py
was implemented by referring to Chapter 2 of Paper.
In lp_original.py
, we also implemented a method to calculate the iterative and iterative matrix product without solving the simultaneous equations (lp_iter
).
lp.py
# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve
def load_adjucency(filename,n):
W = lil_matrix((n,n))
for line in open(filename, 'r'):
entries = line.rstrip().split(' ')
src = int(entries[0])
dst = int(entries[1])
W[src,dst] = 1
return W
def load_labels(filename,n):
y = lil_matrix((1,n))
for line in open(filename, 'r'):
entries = line.rstrip().split(' ')
nid = int(entries[0])
label = int(entries[1])
y[0,nid] = label
return y
n = int(sys.argv[1])
W = load_adjucency(sys.argv[2],n)
y = load_labels(sys.argv[3],n)
D = dia_matrix((W.sum(0),[0]), W.shape)
L = D - W
I = identity(W.shape[0])
lam = 1
f = spsolve((I + lam * L), y)
print f
lp_original.py
# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve
from sklearn.preprocessing import normalize
def load_matrix(filename,n,m):
W = lil_matrix((n+m,n+m))
for line in open(filename, 'r'):
entries = line.rstrip().split(' ')
src = int(entries[0])
dst = int(entries[1])
W[src,dst] = 1
W = normalize(W,norm='l1',axis=1)
Wul = W[n:n+m,0:n]
Wuu = W[n:n+m,n:n+m]
return (Wuu,Wul)
def load_labels(filename,n):
y = lil_matrix((n,1))
i = 0
for line in open(filename, 'r'):
label = int(line.rstrip())
y[i,0] = label
i += 1
return y
def lp_solve(Wuu,Wul,y):
return spsolve((identity(m) + Wuu), Wul*y)
def lp_iter(Wuu,Wul,y,t=10):
i = 0
f = lil_matrix((Wuu.shape[0],1))
while True:
f = Wuu * f + Wul * y
if i > t:
break
i += 1
return f
""" # of labeled nodes """
n = int(sys.argv[1])
""" # of unlabeled nodes """
m = int(sys.argv[2])
""" propagation matrix (row normalized) """
Wuu,Wul = load_matrix(sys.argv[3],n,m)
""" labels of labeled nodes """
y = load_labels(sys.argv[4],n)
print lp_solve(Wuu,Wul,y)
print '============='
print lp_iter(Wuu,Wul,y)
lp.py
is executed by passing the number of nodes, the file path of the edge list, and the file path of the node label in this order.
The data used is the same as the blog I referred to.
lp.Execution example of py
$ cat edge_list
0 2
2 0
0 4
4 0
0 9
9 0
1 2
2 1
2 3
3 2
3 4
4 3
3 6
6 3
5 9
9 5
6 7
7 6
6 8
8 6
7 9
9 7
$ cat labels
0 1
2 1
4 1
5 -1
6 -1
7 -1
$ python lp.py 10 edge_list labels
[ 0.45966011 0.23023256 0.46046512 0.1519678 0.5372093 -0.57951699
-0.38980322 -0.51627907 -0.19490161 -0.15903399]
lp_original.py
is executed by passing the number of labeled nodes n
, the number of unlabeled nodes m
, the file path of the edge list, and the file path of the edge list in this order.
However, the node numbers must be from 0
to n-1
to labeled nodes and from n
to n + m
to unlabeled nodes.
In this example, the correspondence between the node number passed to lp.py
and the node number passed to lp_original.py
is as follows.
lp.py
[0,1,2,3,4,5,6,7,8,9]
lp_original.py
[0,6,1,7,2,3,4,5,8,9]
Therefore, each line of the label file contains the labels of the labeled nodes from 0
to n
.
With this implementation, you only need to use the edge from the unlabeled node to the labeled node and the edge from the unlabeled node to the unlabeled node.
lp_original.Execution example of py
$ cat edge_list_for_original
9 0
6 1
7 1
7 2
7 4
9 3
8 4
9 5
$ cat labels_for_original
1
1
1
-1
-1
-1
$ python lp_original.py 6 4 edge_list_for_original labels_for_original
[ 1. 0.33333333 -1. -0.33333333]
=============
(0, 0) 1.0
(1, 0) 0.333333333333
(2, 0) -1.0
(3, 0) -0.333333333333
The result is different from lp.py
, but the tendency is the same.
Recommended Posts