[PYTHON] Network Analysis with NetworkX --- Community Detection Volume

I wanted to extract something like a "small group where all of each other is connected", that is, a community, from the entire network, so I did it with the python library "NetworkX".

https://networkx.github.io/

What is a community here?

In human terms, it is a "group of people who are friends with each other", and it is a set that is a complete graph (all connected) as a complement of the entire network. It seems to be called "Clique" in graph theory.

Make a graph structure

Experiment with a small set of 6 nodes.


from networkx import *
import matplotlib.pyplot as plt

#raw data
data = [["A","B"],["A","C"],["C","B"],["D","C"],["D","E"],["C","E"],["E","F"]]

#Building nodes
G = nx.Graph()
for p in list(set([r[0] for r in data] + [r[1] for r in data])):
    G.add_node(p)

#Edge construction
for d in data:
    G.add_edge(d[0],d[1])

#drawing
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=1200,node_color="white")
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
plt.show()

Visualized graph structure

スクリーンショット 2017-03-28 16.19.25.png

Extract clique from a pre-built graph structure

Now that we have a graph structure, let's detect the community.


cliques = nx.find_cliques(G)
#Standard output
print  [ c for c in cliques]

Standard output

[['C', 'A', 'B'], ['C', 'E', 'D'], ['F', 'E']]

came out! You can extract the community properly!

If it is small data, it feels like Hoon, but if you do it with large data, there are various discoveries.

performance

It is very fast because it can process about 100,000 data in a few seconds. However, clique detection seems to be a system processing whose processing time explodes as the number of nodes increases (probably a problem with NP), so if it is more big data, NetWork X may reach a plateau, but it is still a mystery.

https://en.wikipedia.org/wiki/Clique_problem

Perhaps serious people use Spark or something.

Recommended Posts

Network Analysis with NetworkX --- Community Detection Volume
Voice actor network analysis (using word2vec and networkx) (1/2)
Voice actor network analysis (using word2vec and networkx) (2/2)
Data analysis with python 2
Basket analysis with Spark (1)
Dependency analysis with CaboCha
Voice analysis with python
Voice analysis with python
Dynamic analysis with Valgrind
Regression analysis with NumPy
Data analysis with Python
Image analysis with Object Detection API to try in 1 hour