I develop and operate a talent mining service called TalentBase, which analyzes a huge amount of social data.
Under such circumstances, as part of the graph analysis in the sense of grasping human relationships, I decided to analyze "how much distance is this person and this person?"
Python, NetworkX If is installed, you can implement it immediately.
For the data, I decided to use the directed graph data of SNS and try to virtually plot the distance as the shortest path problem between people.
Since the data was stored in MySQL, I implemented it with a scheme of connecting to the DB using mysql-connector, acquiring directed graph data, and calculating it with NetworkX.
main.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import networkx as nx
import mysql.connector
db_config = {
'user': 'USERNAME',
'password': 'PASSWORD',
'host': 'HOST',
'database': 'DATABASE',
'port': 'PORT'
}
connect = mysql.connector.connect(**db_config)
cur=connect.cursor(buffered=True)
g = nx.DiGraph()
cur.execute("select FROM_USER_ID,TO_USER_ID,WEIGHT from TABLE_NAME")
rows = cur.fetchall()
for row in rows:
if row[0] == row[1] or row[0] is None or row[1] is None:
continue
g.add_node(str(row[0]))
g.add_node(str(row[1]))
g.add_edge(str(row[0]),str(row[1]), weight=int(row[2]))
#Find the shortest path with Dijkstra's algorithm
print nx.dijkstra_path(g, 'NODE1', 'NODE2') #Output the shortest path from NODE1 to NODE2
#Output example=> ['NODE1','NODE9','NODE3','NODE7','NODE5','NODE2']
print nx.dijkstra_path_length(g, 'NODE1', 'NODE2') #Output the shortest path distance from NODE1 to NODE2(If all weights are 1, the number of crossed NODEs will be the distance)
#Output example=> 8
If you start handling a large amount of graph data, the calculation performed this time will take a long time, so Next time, I would like to analyze using GraphDB, which represents Neo4j.
Recommended Posts