There are methods such as the Girvan-Newman algorithm for maximizing Modularity "Q" when the number of communities to be divided by the clustering algorithm is unknown. For clustering, see Social graph clustering by the Newman algorithm --slideshare is easy to understand.
Among them, the louvain algorithm is an algorithm that currently requires a relatively small amount of calculation.
Click here for the paper Blondel, Vincent D., et al. "Fast unfolding of communities in large networks." Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008.
Clustering networkx graphs using python-louvain, a library of louvain algorithms in python.
I was able to install with pip in my environment (Python 2.7.10, pip 8.1.2).
pip-install
sudo pip install python-louvain
Or Download the source code
install
python setup.py install
Library import
import
import community
Can be done with
This time, we will perform clustering with the graph of the indie artist created in Previous article. For the sake of simplicity, we targeted the 2015 data. The number of nodes is 8523 and the number of edges is 61,319, which is a weighted undirected graph. Since detailed cluster analysis has not been performed, it is not known what the clusters are composed of, so the optimum number of clusters is determined by the louvain algorithm. It seems to be best for clustering while seeking.
If you plot the graph as it is, you will get the following graph.
The basic clustering calculation of python-louvain can be done with community.best_partition (nxGraph). It will be returned in dict format as below.
In [1]: community.best_partition(G)
Out[1]:
{'nodelabel': int(community),
'nodelabel': int(community),
'nodelabel': int(community),
...
Cluster the graph and plot it. Create a function to plot by referring to the sample.
clustering.py
import networkx as nx
import community
def clusteringplot(G):
partition = community.best_partition(G)
size = float(len(set(partition.values())))
pos = nx.spring_layout(G)
count = 0.
for com in set(partition.values()):
count += 1.
list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
nx.draw_networkx_nodes(G, pos, list_nodes, node_size=150, node_color = str(count/size))
plt.show()
The output graph is below. Colored by cluster.
As stated in here, clustering is a kind of arbitrary thing, and what is important is what the cluster is composed of. There is no doubt that it is a meaning. After clarifying the number of clusters and the configuration by the louvain algorithm, we must not forget to analyze the components and make the meaning.
Clustering and visualization using Python and CytoScape-Qiita
Recommended Posts