Clustering and visualization using Python and CytoScape

Overview

Clustering is a simple data analysis method that makes it easy to obtain useful results. Not only data that originally has a network structure, but also data that does not have a network structure can be networked and clustered by defining a distance function. This entry introduces how to perform clustering and visualize the results.

What is a network?

What is a network structure in the first place? Generally, it is data composed of nodes and edges. The edges may or may not be directional A graph with a direction is called a directed graph, and a graph without a direction is called an undirected graph. Some edges have weights and some do not, and some are called weighted graphs.

What is clustering?

To divide a set of data into several groups (subsets). Divide each subset so that it has some common characteristics In network analysis, it is sometimes called Community Detection. When analyzing data, the method of calculating each relationship, networking it, clustering it, and visualizing it as a group is often used to understand the structure of the data. This is a useful method that makes it easy to obtain knowledge.

About the algorithm introduced in this entry

The algorithm introduced this time is an algorithm aimed at dividing the network to maximize the modularity. Modularity is an indicator of how dense a network is than a random network. In a 2010 comparative paper, it received the best evaluation for community extraction from networks (http://arxiv.org/abs/0906.0612).

A well-known clustering algorithm is K-means. Since K-means reflects the closeness of each node, it can be said that it is an algorithm that considers only the first-order connection. The algorithm introduced this time clusters the network. The difference from K-means

That is the point.

About the library

Method paper

Fast unfolding of communities in large networks, Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Renaud Lefebvre, Journal of Statistical Mechanics: Theory and Experiment 2008(10)

Installation

From the following Bitbucket

https://bitbucket.org/taynaud/python-louvain

Sample code

Visualize a typical data set of a social network called karate_club.

karate_community.py


import community
import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
partition = community.best_partition(G)
size = float(len(set(partition.values())))
pos = nx.spring_layout(G)
count = 0.
for com in set(partition.values()):
	count += 1.
	list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
	nx.draw_networkx_nodes(G, pos, list_nodes, node_size=20, node_color = str(count/size) )
plt.show()

kobito.1392165578.802329.png

Use graph visualization tools

As mentioned above, it is possible to draw a graph only with Python, When the network becomes large, it is often convenient to be able to perform interactive operations by drawing, such as using only the largest connected graph or erasing edges below a certain weight. There are several graph visualization tools, but this time we will introduce how to reflect the clustering results in the visualization tool using CytoScape as an example.

The library introduced this time is expressed based on NetworkX. The network represented by NetworkX can be written out in a data format that can be linked with various visualization software.

Let's write it out in GraphML format this time

How to write out community information?

Nodes can have various attribute information. It can be given by passing a dict with the node ID as the key. This allows you to keep data on how each node is clustered. Export this in GML format.

import community
import networkx as nx
G = nx.karate_club_graph()
partition = community.best_partition(G)
labels = dict([(i, str(i)) for i in xrange(nx.number_of_nodes(G))])
nx.set_node_attributes(G, 'label', labels)
nx.set_node_attributes(G, 'community', partition)
nx.write_gml(G, "community.gml")

Read data using CytoScape

kobito.1419932553.584279.png

Specify a file from From Network File in the dialog after startup and read it. (Or File-> import-> Network-> File)

kobito.1419939918.525853.png

I'm sure you're not sure, but this is because you didn't specify the layout. First, let's specify the display algorithm. If you select Layout-> yFiles Layouts-> Organic, the display will be as follows.

kobito.1419940057.684839.png

It was displayed like that.

Add clustering results to visualization

Next, the result of clustering is reflected in this graph. This time, let's change the color of the node according to the cluster. Use the Style menu of the Control Panel to change the color and size of the graph. If the Control Panel is not displayed, display it with View-> Show Contorol Panel.

kobito.1419940252.444640.png

Use Fill Color to change the color. You can decide under what conditions the settings are assigned on the map. Click on the Map of the item you want to change

kobito.1419941288.020166.png

Such items are displayed. Set the conditional data items in Column. Items set in set_node_attributes can be selected. Let's select community

スクリーンショット 2014-12-30 21.23.14.png

Select the conditions under which the definition is changed in Mapping Type. There are 3 types

--Continuous Mapping: Make the value continuous and change the size and color gradation depending on the size. --Discrete Mapping: Set the value as a discrete value and set the color for each value. --PassThrough Mapping: The value is converted to data as it is. For example, if you specify red, it will turn red.

This time, the number of clusters is as small as 4, and the size of the value is meaningless, so select Discrete Mapping. When you select it, the following pull-down menu will appear, so select and specify the color.

kobito.1419951614.192048.png

With this setting, the color of the first graph changes like this. I was able to visualize the results of clustering.

kobito.1419951735.280271.png

Summary

In this entry, we introduced the flow from network clustering to visualization. Network analysis is relatively easy, but it is a method that makes it easy to obtain interesting results, so if you are interested, please try it!

Recommended Posts

Clustering and visualization using Python and CytoScape
Authentication using tweepy-User authentication and application authentication (Python)
Easy visualization using Python but PixieDust
Using Python with SPSS Modeler extension nodes ① Setup and visualization
From Python to using MeCab (and CaboCha)
Using Python and MeCab with Azure Databricks
Regression model and its visualization using scikit-learn
I'm using tox and Python 3.3 with Travis-CI
Start using Python
Scraping using Python
Graph analysis and visualization on IPython Notebook using Cytoscape / cyREST and py2cytoscape Part 1
Head orientation estimation using Python and OpenCV + dlib
I tried web scraping using python and selenium
Notes on installing Python3 and using pip on Windows7
Python development flow using Poetry, Git and Docker
I tried object detection using Python and OpenCV
Create a web map using Python and GDAL
[Python3] Automatic sentence generation using janome and markovify
Try using tensorflow ① Build python environment and introduce tensorflow
Create a Mac app using py2app and Python3! !!
Try using ChatWork API and Qiita API in Python
Clustering text in Python
[python] Compress and decompress
Initial settings for using Python3.8 and pip on CentOS8
Python and numpy tips
[Python] pip and wheel
Operate Redmine using Python Redmine
Searching for pixiv tags and saving illustrations using Python
Extendable skeletons for Vim using Python, Click and Jinja2
Fibonacci sequence using Python
Python Data Visualization Libraries
Try creating a compressed file using Python and zlib
Aggregate Git logs using Git Python and analyze associations using Orange
Batch design and python
Python iterators and generators
Data analysis using Python 0
Python packages and modules
Vue-Cli and Python integration
Ruby, Python and map
Various Python visualization tools
Data cleaning using Python
Send and receive Gmail via the Gmail API using Python
python input and output
Using Python #external packages
Python and Ruby split
WiringPi-SPI communication using Python
Age calculation using python
Get and automate ASP Datepicker control using Python and Selenium
Read and write NFC tags in python using PaSoRi
Search Twitter using Python
Speech transcription procedure using Python and Google Cloud Speech API
Get files from Linux using paramiko and scp [Python]
Name identification using python
Notes using Python subprocesses
Python3, venv and Ansible
Python asyncio and ContextVar
Try using Tweepy [Python2.7]
Logistics visualization with Python
HTTP server and HTTP client using Socket (+ web browser) --Python3
Try to make it using GUI and PyQt in Python
Visualize plant activity from space using satellite data and Python