Try to solve the shortest path with Python + NetworkX + social data

graph解析.png

What you did and what you wanted to do

I develop and operate a talent mining service called TalentBase, which analyzes a huge amount of social data.

Under such circumstances, as part of the graph analysis in the sense of grasping human relationships, I decided to analyze "how much distance is this person and this person?"

Environment

Python, NetworkX If is installed, you can implement it immediately.

Data preparation

For the data, I decided to use the directed graph data of SNS and try to virtually plot the distance as the shortest path problem between people.

Since the data was stored in MySQL, I implemented it with a scheme of connecting to the DB using mysql-connector, acquiring directed graph data, and calculating it with NetworkX.

Implementation / calculation

main.py


    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import networkx as nx
    import mysql.connector
    
    db_config = {
      'user': 'USERNAME',
      'password': 'PASSWORD',
      'host': 'HOST',
      'database': 'DATABASE',
      'port': 'PORT'
    }
    
    connect = mysql.connector.connect(**db_config)
    cur=connect.cursor(buffered=True)
    
    g = nx.DiGraph()
    
    cur.execute("select FROM_USER_ID,TO_USER_ID,WEIGHT from TABLE_NAME")
    rows = cur.fetchall()
    
    for row in rows:
      if row[0] == row[1] or row[0] is None or row[1] is None:
        continue
      g.add_node(str(row[0]))
      g.add_node(str(row[1]))
      g.add_edge(str(row[0]),str(row[1]), weight=int(row[2]))
    
    #Find the shortest path with Dijkstra's algorithm
    print nx.dijkstra_path(g, 'NODE1', 'NODE2') #Output the shortest path from NODE1 to NODE2
    #Output example=> ['NODE1','NODE9','NODE3','NODE7','NODE5','NODE2']
    print nx.dijkstra_path_length(g, 'NODE1', 'NODE2') #Output the shortest path distance from NODE1 to NODE2(If all weights are 1, the number of crossed NODEs will be the distance)
    #Output example=> 8

Consideration

If you start handling a large amount of graph data, the calculation performed this time will take a long time, so Next time, I would like to analyze using GraphDB, which represents Neo4j.

Recommended Posts

Try to solve the shortest path with Python + NetworkX + social data
Try to solve the man-machine chart with Python
Try to solve the programming challenge book with python3
Try to solve the internship assignment problem with Python
Visualize railway line data and solve the shortest path problem (Python + Pandas + NetworkX)
Try to solve the fizzbuzz problem with Keras
[Introduction to Algorithm] Find the shortest path [Python3]
Find the shortest path with the Python Dijkstra's algorithm
Try to image the elevation data of the Geographical Survey Institute with Python
Try to solve the traveling salesman problem with a genetic algorithm (Python code)
I tried to solve the problem with Python Vol.1
Try scraping the data of COVID-19 in Tokyo with Python
I wanted to solve the Panasonic Programming Contest 2020 with Python
Solve the spiral book (algorithm and data structure) with python!
Try to decipher the garbled attachment file name with Python
[Introduction to Python] How to get data with the listdir function
Try to extract the features of the sensor data with CNN
Try to operate Facebook with Python
I tried to solve the ant book beginner's edition with python
Try to solve the N Queens problem with SA of PyQUBO
Try to get CloudWatch metrics with re: dash python data source
Try to reproduce color film with Python
Try logging in to qiita with Python
Convert Excel data to JSON with python
I wanted to solve ABC160 with Python
Convert FX 1-minute data to 5-minute data with Python
Try converting to tidy data with pandas
I wanted to solve ABC172 with Python
The road to compiling to Python 3 with Thrift
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
The 16th offline real-time how to write reference problem to solve with Python
Try to solve the traveling salesman problem with a genetic algorithm (Theory)
The 19th offline real-time how to write reference problem to solve with Python
Try to solve a set problem of high school math with Python
[Cloudian # 5] Try to list the objects stored in the bucket with Python (boto3)
Try to aggregate doujin music data with pandas
I wanted to solve NOMURA Contest 2020 with Python
Just add the python array to the json data
I tried to save the data with discord
The easiest way to synthesize speech with python
Try to draw a life curve with python
Specify the Python executable to use with virtualenv
How to try the friends-of-friends algorithm with pyfof
Say hello to the world with Python with IntelliJ
I tried to get CloudWatch data with Python
Try to make a "cryptanalysis" cipher with Python
Try to automatically generate Python documents with Sphinx
The easiest way to use OpenCV with python
Introduction to Python with Atom (on the way)
I want to solve APG4b with Python (Chapter 2)
Try to make a dihedral group with Python
Write CSV data to AWS-S3 with AWS-Lambda + Python
Try to detect fish with python + OpenCV2.4 (unfinished)
Try to solve the traveling salesman problem with a genetic algorithm (execution result)
Location information data display in Python --Try plotting with the map display library (folium)-
Try to use up the Raspberry Pi 2's 4-core CPU with Parallel Python
[Introduction to Python] How to get the index of data with a for statement
Data analysis with python 2
Try scraping with Python.
Solve AtCoder 167 with python
[Python] Try to read the cool answer to the FizzBuzz problem