Clustering with python-louvain

Clustering with louvain algorithm

There are methods such as the Girvan-Newman algorithm for maximizing Modularity "Q" when the number of communities to be divided by the clustering algorithm is unknown. For clustering, see Social graph clustering by the Newman algorithm --slideshare is easy to understand.

Among them, the louvain algorithm is an algorithm that currently requires a relatively small amount of calculation.

Click here for the paper Blondel, Vincent D., et al. "Fast unfolding of communities in large networks." Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008.

Clustering networkx graphs using python-louvain, a library of louvain algorithms in python.

Installation

I was able to install with pip in my environment (Python 2.7.10, pip 8.1.2).

pip-install


sudo pip install python-louvain

Or Download the source code

install


python setup.py install

Library import

import


import community

Can be done with

Clustering

This time, we will perform clustering with the graph of the indie artist created in Previous article. For the sake of simplicity, we targeted the 2015 data. The number of nodes is 8523 and the number of edges is 61,319, which is a weighted undirected graph. Since detailed cluster analysis has not been performed, it is not known what the clusters are composed of, so the optimum number of clusters is determined by the louvain algorithm. It seems to be best for clustering while seeking.

If you plot the graph as it is, you will get the following graph. nomal.png

The basic clustering calculation of python-louvain can be done with community.best_partition (nxGraph). It will be returned in dict format as below.

In [1]: community.best_partition(G)                                          
Out[1]: 
{'nodelabel': int(community),
'nodelabel': int(community),
'nodelabel': int(community),
...

Cluster the graph and plot it. Create a function to plot by referring to the sample.

clustering.py


import networkx as nx
import community
def clusteringplot(G):
  partition = community.best_partition(G)                                                                                                           
  size = float(len(set(partition.values())))                                                                                                 
  pos = nx.spring_layout(G)                                                                                                                  
  count = 0.                                                                                                                                 
  for com in set(partition.values()):                                                                                                        
    count += 1.                                                                                                                              
    list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]                                                            
    nx.draw_networkx_nodes(G, pos, list_nodes, node_size=150, node_color = str(count/size))                                                                                                                                 
  plt.show()

The output graph is below. Colored by cluster. cluserring.png

Clustering then

As stated in here, clustering is a kind of arbitrary thing, and what is important is what the cluster is composed of. There is no doubt that it is a meaning. After clarifying the number of clusters and the configuration by the louvain algorithm, we must not forget to analyze the components and make the meaning.

reference

Clustering and visualization using Python and CytoScape-Qiita

Recommended Posts

Clustering with python-louvain
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
Clustering with scikit-learn + DBSCAN
DBSCAN (clustering) with scikit-learn
I tried clustering with PyCaret
Clustering ID-POS data with LDA
Deep Embedded Clustering with Chainer 2.0
Photo segmentation and clustering with DBSCAN
The most basic clustering analysis with scikit-learn
Clustering representative schools in summer 2016 with scikit-learn
Clustering books from Aozora Bunko with Doc2Vec
[Python] Clustering with an infinitely mixed Gaussian model