[GO] [python] I created a follow-up correlation diagram for twitter (Gremlin edition)

Article content

The other day I wrote an article to create a follow-up correlation diagram on Twitter.

[Python] I tried to visualize the follow relationship of Twitter

In the above article, get follow account information with Twitter API and register it in mongoDB. After that, the logic was to get the data from mongoDB and draw it on the graph while checking if they are following each other.

I learned that using GraphDB is convenient for network analysis, so I used GraphDB as well.

environment

python:3.7 gremlinpython:3.4.6 gremlin:3.4.6

Install Gremlin

I built the environment on Windows. You can download the tools for Windows from the following.

https://downloads.apache.org/tinkerpop/3.4.6/

It is "server" to download. It would be nice to have a "console", but I won't use it in this article.

After downloading, just unzip the ZIP, place it in any folder, and execute bat under the bin folder.

gremlinpython You can install it with the pip command.

pip install gremlinpython

Implementation

Now that the environment is ready, we will implement it.

Register mongoDB data in Gremlin

Gremlin is a DB that can handle Graph type data models. Since mongoDB could not manage the relation between data, we will register the data relation while registering the data in Gremlin.

mongoDB data

The data of mongoDB is as follows. A list of Twitter accounts and the accounts they follow is registered. A lot of the following data is registered.

{
        "_id" : ObjectId("5e6c52a475646eb49cfbd62b"),
        "screen_name" : "yurinaNECOPLA",
        "followers_info" : [
                {
                        "screen_name" : "Task_fuuka",
                        "id" : NumberLong("784604847710605312")
                },
 (Omitted)
                {
                        "screen_name" : "nemui_oyasumi_y",
                        "id" : NumberLong("811491671560974336")
                }
        ]
}

code

from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.traversal import TraversalSideEffects
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from mongo_dao import MongoDAO

mongo = MongoDAO("db", "followers_info")
graph = Graph()

# Gremlin connection creation
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

start_name = 'yurinaNECOPLA'

def addValueEdge(parent_name, depth):
    if depth == 0:
        return False
    print(parent_name)
    result = mongo.find_one(filter={'screen_name': parent_name})
    if result == None or len(result) == 0:
        return False

 # Add vertices
    g.addV(parent_name).property('screen_name', parent_name).toSet()
    p = g.V().has('screen_name', parent_name).toList()[0]

    for follower in result['followers_info']:
        if addValueEdge(follower['screen_name'], depth-1):
            cList = g.V().has('screen_name', follower['screen_name']).toList()
            if len(cList) != 0:
 # Add edge
                g.addE('follow').from_(p).to(cList[0]).toSet()
    return True

addValueEdge(start_name, 3)

Code commentary

  1. Decide on a top account
  2. Pass the account name to addValueEdge
  3. Get data from MongoDB
  4. Add vertices to Gremlin once the data is available
  5. Pass the account names you are following one by one (return to 2.)
  6. Add an edge

Data and edges are registered recursively in this way.

Create a correlation diagram

The basic construction is the same as when data was acquired from mongoDB.

import json
import networkx as nx
import matplotlib.pyplot as plt
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.traversal import TraversalSideEffects
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

start_screen_name = 'yurinaNECOPLA'
graph = Graph()

# Gremlin connection creation
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

# Create a new graph
G = nx.Graph()
 #Add node
G.add_node(start_screen_name)

def add_edge(screen_name, depth):
    if depth == 0:
        return
    name = g.V().has('screen_name', screen_name).toList()[0]
    follows_list = g.V(name).both().valueMap().toList()
    for follow in follows_list:
        print(follow['screen_name'][0])
        G.add_edge(screen_name, follow['screen_name'][0])
        add_edge(follow['screen_name'][0], depth-1)

add_edge(start_screen_name, 3)

# Creating a diagram. figsize is the size of the figure
plt.figure(figsize=(10, 8))

# Determine the layout of the figure. The smaller the value of k, the denser the figure
pos = nx.spring_layout(G, k=0.8)

# Drawing nodes and edges
# _color: Specify color
# alpha: Specifying transparency
nx.draw_networkx_edges(G, pos, edge_color='y')
nx.draw_networkx_nodes(G, pos, node_color='r', alpha=0.5)

# Add node name
nx.draw_networkx_labels(G, pos, font_size=10)

# Setting not to display X-axis and Y-axis
plt.axis('off')

plt.savefig("mutual_follow.png ")
# Draw a diagram
plt.show()

Code commentary

The key points are as follows.

    name = g.V().has('screen_name', screen_name).toList()[0]
    follows_list = g.V(name).both().valueMap().toList()
    for follow in follows_list:
        print(follow['screen_name'][0])
        G.add_edge(screen_name, follow['screen_name'][0])
        add_edge(follow['screen_name'][0], depth-1)

The first line gets the follower information from Gremlin. You can get the edge information of the information obtained in the second line. Since this data is a dict type list, you can get the account name by getting one by one and getting screen_name.

result

Execution result mutual_follow.png

The result was quite unsightly, but I was able to create a correlation diagram.

Correlation diagram of another pattern

The above correlation diagram also shows the cyclical relationships such as account A → account B, account B → account C, and account C → account A.

If you want to prevent circulation, you can realize it by adding control that the data already registered at the timing of registering in Gremlin is not added.

def registCheck(screen_name):
    check = g.V().has('screen_name', screen_name).toList()
    if len(check) == 0:
        return False
    else:
        return True

def addValueEdge(parent_name, depth):
    if depth == 0 or registCheck(parent_name):
        return False
    print(parent_name)
    result = mongo.find_one(filter={'screen_name': parent_name})
    if result == None or len(result) == 0:
        return False

 # Add vertices
    g.addV(parent_name).property('screen_name', parent_name).toSet()
    p = g.V().has('screen_name', parent_name).toList()[0]

    for follower in result['followers_info']:
        if addValueEdge(follower['screen_name'], depth-1):
            cList = g.V().has('screen_name', follower['screen_name']).toList()
            if len(cList) != 0:
 # Add edge
                g.addE('follow').from_(p).to(cList[0]).toSet()
    return True

By adding a registerCheck to check if the data is registered in Gremlin, the cyclic relationship could be excluded.

result

mutual_follow1.png

Summary

In the figure with circulation, accounts that are closely related to each other and follow each other are output together. In the figure without circulation, the account that the starting account is following is output nearby, but since the logic is recursively constructed, the account that is following each other with account A is output at a distant position. Some are. It seems that we still need to consider how to set the edge.

Registering data relationships is similar to RDB, but I got the impression that it is very difficult to handle intuitively with gremlin python. If you can read the document and understand the mechanism to some extent, it will be useful for network analysis.

Recommended Posts

[python] I created a follow-up correlation diagram for twitter (Gremlin edition)
I created a password tool in Python.
I created a template for a Python project that can be used universally
I made a python dictionary file for Neocomplete
I made a Twitter fujoshi blocker with Python ①
I created a Dockerfile for Django's development environment
A * algorithm (Python edition)
I made a Twitter BOT with GAE (python) (with a reference)
I made a VM that runs OpenCV for Python
[Python] I made a classifier for irises [Machine learning]
I created an environment for Masonite, a Python web framework similar to Laravel, with Docker!
[Python] Created a transformation app for world champion "Mr. Satan"
[VSCode] I made a user snippet for Python print f-string
Create a Twitter BOT with the GoogleAppEngine SDK for Python
I created a class in Python and tried duck typing
Create a correlation diagram from the conversation history of twitter
A memorandum about correlation [Python]
I made a python text
Python> I made a test code for my own external file
I created a Python library to call the LINE WORKS API
Created a library for python that can easily handle morpheme division
I made a lot of files for RDP connection with Python
[For beginners] How to register a library created in Python in PyPI
I made a scaffolding tool for the Python web framework Bottle
I made a Python wrapper library for docomo image recognition API.
I touched PyAutoIt for a moment
I made a Line-bot using Python!
I made a fortune with Python.
Created AtCoder test tool for Python
Draw a CNN diagram in Python
I made a daemon with Python
I made a Docker container to use JUMAN ++, KNP, python (for pyKNP).
ETL processing for a large number of GTFS Realtime files (Python edition)
[Python] Created a Twitter bot that generates friend-like tweets using Markov chains
I created a visualization site for GDP (Gross Domestic Product) using DASH!
I tried to make a traffic light-like with Raspberry Pi 4 (Python edition)