[PYTHON] Behind the graph drawing algorithm and Networkx

0. How do you draw a graph?

In order to draw in two dimensions, it is necessary to give appropriate coordinates to each vertex, but the graph has only information on the vertices and edges. How do you arrange the vertices? ??

This article describes the ** Fruchterman-Reingold algorithm **, which arranges graphs nicely. Python makes it easy to use with a library called networkx. However, it's too easy and frustrating, so I'll follow the implementation of networkx on GitHub to see how it works.

The flow of this article is as follows.

  1. Try to move
  2. Algorithm description
  3. Follow the implementation of Networkx

1. Try to move

For those who are satisfied if it works, I will show an implementation example first. With Google colaboratory, networkx is already installed, so you can try it immediately by copying.

Random placement → random_layout ()

import networkx as nx
import matplotlib.pyplot as plt

N = 500 #Number of vertices
K = 5
P = 0.05

G = nx.watts_strogatz_graph(N,K,P) #Appropriate graph
pos = nx.random_layout(G, seed=0)

#Draw graph
plt.figure(figsize=(15,15))
nx.draw_networkx(G, pos, with_labels=False, node_shape='.')
plt.axis("off")
plt.show()

random.png Wow…. do not want to see…. It's unbearable to see in a random arrangement.

Arranged nicely → spring_layout ()

import networkx as nx
import matplotlib.pyplot as plt

N = 500 #Number of vertices
K = 5
P = 0.05

G = nx.watts_strogatz_graph(N,K,P)
pos = nx.spring_layout(G)

#Draw graph
plt.figure(figsize=(15,15))
nx.draw_networkx(G, pos, with_labels=False, node_shape='.')
plt.axis("off")
plt.show()

nx_spring.png

It's arranged nicely.

What is a "good arrangement"?

Think a little more strictly about "good placement". For example, suppose you now have the information for the graph $ G $ as follows.

G = (V, E) $ V $: Vertex set $ E $: Edge set

There are innumerable ways to arrange this graph in two dimensions. (You can arrange them randomly as before.) However, if possible, I want to arrange them so that they are easy to see. Let's call an arrangement that meets the following conditions a "good arrangement".

① The connection relationship of the sides is reflected in the distance

In other words, ** points connected by edges should be placed close to each other, and points not connected by edges should be placed far away **. In the case of an adjacency matrix, the unconnected part is set to ʻinf`, so it seems that all points can be treated in the same way regardless of the presence or absence of edges.

➁ Vertices do not overlap

If the vertices overlap, you can't see the sides, which is a problem.

I would like to find a graph layout that satisfies the above two conditions. The ** mechanical model (force-directed graph) ** makes this possible.

And in the previous networkx.spring_layout (), ** Fruchterman-Reingold algorithm ** is used. This time I will introduce this. Take a look at the gif below to get an idea of this technique.

力学モデル.gif From the random initial placement, it will gradually become cleaner! !! ([Borrowed from here] [Explanatory article on mechanical model])

Want to know the algorithm?(y/n)
y
Let's know.

2.Fruchterman-Reingold algorithm

Define force

In the mechanical model, the vertices are moved by applying force. The Fruchterman-Reingold algorithm gives you ** Attractive Force ** and ** Repulsive Force **.

fa_vs_fr.jpg

The attractive force works on the vertices connected by the sides. Repulsion, on the other hand, works on vertices that are not connected by edges. This corresponds to the previous condition (1) "The connection relationship of the sides is reflected in the distance".

You can define power in any way, but for now let's use the definition proposed in the paper.

Attraction: $ f_a = \ dfrac {d ^ 2} {k} $ Repulsion: $ f_r =-\ dfrac {k ^ 2} {d} $

$ d $: Distance between vertices k = C \sqrt{\dfrac{\text{area}}{|V|}}, C \in \mathbb{R} ,\ \ \text{area}Is the area of the screen to draw

It's like a spring formula, so it's intuitive. The graph of the relationship between force and distance is as follows.

力学モデル_force_vs_distance.jpg From the graphs and formulas we can see that:

--The farther the attraction is, the stronger it works ――The closer the repulsive force is, the stronger it works. --Attractive force and repulsive force become equal at a certain distance $ k $

If the distance $ k $ at which the forces antagonize is set to an appropriate size, the above condition ➁ "Vertex does not overlap" can be satisfied.

algorithm

Now let's look at the algorithm.

  1. Randomly initially place
  2. Calculate attractive and repulsive forces for each vertex
  3. Move the vertices according to the force. However, do not go out of the frame.
  4. Lower the temperature $ t $
  5. Repeat steps 2-4 a certain number of times

I'm getting an unfamiliar variable $ t $. Temperature $ t $ is a parameter that limits the amount of movement of vertices. In other words, the movement amount should be $ t $ or less with respect to the movement direction vector (let's call it $ v $) calculated in step 2.

v = \dfrac{v}{\| v\|} * \min(\| v\|, t)

We will reduce this temperature $ t $ appropriately. Then, although the amount of movement is large at the beginning, it will not move much at the end. (Guarantees convergence)

Also, in the paper, in step 3, the collision with the wall is treated as elastic reflection in order to place it on the screen. If you want to know more about this, please read [Original paper "Graph drawing by force-directed placement"] [Paper]. (Not considered in the networkx implementation ...)


~~ Now that we know the algorithm, let's implement it. ~~ ↑ It's annoying, so I'll leave the implementation to the reader.

In this article, we'll look at how to implement networkx without reinventing the wheel.

3. Take a look at the Networkx implementation

Let's learn by seeing the implementation of people. However, NetworkX 2.4 is the one at that time, and the parts that do not explain exception handling etc. are omitted.

Earlier in the article I used networkx.spring_layout () to draw the graph "good". If you take a look at GitHub,

spring_layout = fruchterman_reingold_layout

I'm just referring to it! Next, let's look at the reference destination fruchterman_reingold_layout.

def fruchterman_reingold_layout(G,
                                k=None,
                                pos=None,
                                #abridgement
                                seed=None):
    """Long docstring
    """
    import numpy as np

    G, center = _process_params(G, center, dim)

    """
Various initialized parts (omitted)
    """ 
    
    """
From here on down is important
    """
    try:
        # Sparse matrix
        if len(G) < 500:  # sparse solver for large graphs
            raise ValueError
        A = nx.to_scipy_sparse_matrix(G, weight=weight, dtype='f')
        if k is None and fixed is not None:
            # We must adjust k by domain size for layouts not near 1x1
            nnodes, _ = A.shape
            k = dom_size / np.sqrt(nnodes)
        pos = _sparse_fruchterman_reingold(A, k, pos_arr, fixed,
                                           iterations, threshold,
                                           dim, seed)
    except ValueError:
        A = nx.to_numpy_array(G, weight=weight)
        if k is None and fixed is not None:
            # We must adjust k by domain size for layouts not near 1x1
            nnodes, _ = A.shape
            k = dom_size / np.sqrt(nnodes)
        pos = _fruchterman_reingold(A, k, pos_arr, fixed, iterations,
                                    threshold, dim, seed)
    if fixed is None and scale is not None:
        pos = rescale_layout(pos, scale=scale) + center
    pos = dict(zip(G, pos))
    return pos

In the try-except part, the functions to be called are divided according to the size of the graph. (Isn't the if statement useless ...? Please tell me.)

-Call _fruchterman_reingold () if $ vertices <500 $ --Call _sparse_fruchterman_reingold () for $ vertices \ geq 500 $

** If you have a lot of vertices, it looks like you are using a sparse solver. ** We haven't reached the algorithm part so far, so let's take a look at _fruchterman_reingold ().

def _fruchterman_reingold(A, k=None, pos=None, fixed=None,
                                 iterations=50, threshold=1e-4, dim=2,
                                 seed=None):
    # Position nodes in adjacency matrix A using Fruchterman-Reingold
    # Entry point for NetworkX graph is fruchterman_reingold_layout()
    # Sparse version
    import numpy as np

    if pos is None:
        # random initial positions
        pos = np.asarray(seed.rand(nnodes, dim), dtype=A.dtype)
    else:
        # make sure positions are of same type as matrix
        pos = pos.astype(A.dtype)

    # optimal distance between nodes
    if k is None:
        k = np.sqrt(1.0 / nnodes)
    # the initial "temperature"  is about .1 of domain area (=1x1)
    # this is the largest step allowed in the dynamics.
    t = max(max(pos.T[0]) - min(pos.T[0]), max(pos.T[1]) - min(pos.T[1])) * 0.1
    # simple cooling scheme.
    # linearly step down by dt on each iteration so last iteration is size dt.
    dt = t / float(iterations + 1)

    displacement = np.zeros((dim, nnodes))
    for iteration in range(iterations):
        displacement *= 0
        # loop over rows
        for i in range(A.shape[0]):
            if i in fixed:
                continue
            # difference between this row's node position and all others
            delta = (pos[i] - pos).T
            # distance between points
            distance = np.sqrt((delta**2).sum(axis=0))
            # enforce minimum distance of 0.01
            distance = np.where(distance < 0.01, 0.01, distance)
            # the adjacency matrix row
            Ai = np.asarray(A.getrowview(i).toarray())
            # displacement "force"
            displacement[:, i] +=\
                (delta * (k * k / distance**2 - Ai * distance / k)).sum(axis=1)
        # update positions
        length = np.sqrt((displacement**2).sum(axis=0))
        length = np.where(length < 0.01, 0.1, length)
        delta_pos = (displacement * t / length).T
        pos += delta_pos
        # cool temperature
        t -= dt
        err = np.linalg.norm(delta_pos) / nnodes
        if err < threshold:
            break
    return pos

This is the implementation part of the algorithm. I will summarize how to handle attractive force and repulsive force.

-Optimal distance between cords:k = \sqrt{\dfrac{1}{|V|}} --Point-to-point distance: $ \ text {distance} = \ max (\ text {distance}, 0.01) $ --Point-Value of point adjacency matrix: $ A_i $ --Temperature $ t $: $ \ max (\ text {width of domain of initial placement}, \ text {width of range of initial placement}) * 0.01 $ --How to lower the temperature: t-= dt, but $ dt = \ dfrac {t} {\ text {iteration} + 1} $ --Direction vector: $ \ text {delta} $ --Attractive force: $ f_a = \ dfrac {k ^ 2} {\ text {distance} ^ 2} $ --Repulsive force: $ f_r = --A_i * \ dfrac {\ text {distance}} {k} $

The optimal distance formula is $ C = 1, \ text {area} = 1 $ when applied to the paper. Simple! The temperature $ t $ will drop significantly at first and then gradually as ʻiteration` progresses. $ \ Text {delta} $ is not exactly a direction vector, but I wrote it because I am adjusting the norm in the end.

What is very worrisome is the definition of power. ** Both attractive and repulsive forces are different from those in the paper. ** Since the paper is from 1991, is there an improved version after that? Another finding is that in the case of weighted graphs, the repulsive force should be multiplied by the value of the adjacency matrix.

When you try to implement it yourself, I think it is better to implement it with the same settings.

bonus I played with the source code a little to visualize the behavior of the algorithm. (I made 200 iterations by tweaking the parameters.) anim_iter200 (1).gif It's interesting that the pieces of disjointed dots gradually become like strings. I got something close to the reference site "[Algorithm for drawing graphs (networks) beautifully] Mechanical model commentary article" with almost no implementation by myself. You did it.

Summary

I explained ** Fruchterman-Reingold algorithm **, which is a graph drawing algorithm, to get a little behind the networkx that can handle graphs easily. After learning the algorithm, it is important to implement it, but it is also learning to see the implementation of a strong person. We look forward to your questions and suggestions.

References

[Algorithm for drawing graphs (networks) neatly] Mechanical model commentary article : A really good article!

[Graph drawing by force-directed placement, 1992] [Paper] : Fruchterman and Reingold's paper

networkx GitHub : What I explained a little this time

[Mechanical model (wikipedia)] [Mechanical model_wiki] : I want to see other methods

[[Python] Summary of basic usage of NetworkX 2.0 Qiita] [networkx2 usage Qiita] : Easy-to-understand articles

Recommended Posts

Behind the graph drawing algorithm and Networkx
wxPython: Draw animation and graph drawing at the same time
Understanding and implementing the Tonelli-Shanks algorithm (2)
Understanding and implementing the Tonelli-Shanks algorithm (1)
Calculate the shortest route of a graph with Dijkstra's algorithm and Python
How to use the graph drawing library Bokeh
Creating a graph using the plotly button and slider
Euclidean algorithm and extended Euclidean algorithm
NetworkX memo (directed graph)
Graph drawing using matplotlib
Graph drawing in python
NetworkX graph generator list
Rabbit and turtle algorithm
[pyqtgraph] Add region to the graph and link it with the graph region
Solve the spiral book (algorithm and data structure) with python!
Graph the Poisson distribution and the Poisson cumulative distribution in Python and Java, respectively.