[PYTHON] I tried to embed a protein-protein interaction network in hyperbolic space with Poincarē embeding of gensim

Introduction

I used genism's Poincarē embedding to embed a network of protein-protein interactions in Poincaré balls.

  1. Hyperbolic space Hyperbolic space is one of the non-Euclidean spaces, which is a curved space with a negative curvature. It has the property that the space expands toward the periphery, and is said to be suitable for embedding networks with a hierarchical structure.

  2. Poincare ball Poincaré ball is one of the models of hyperbolic space. On the Poincare ball, the distance between the two points is$d(u,v) = \mathrm{arccosh}\left(1+2\frac{||u-v||^{2}}{(1-||u||^{2})(1-||v||^{2})}\right).$ It is expressed as. We define the loss function based on the distance on the Poincare ball as follows: $\Theta'\leftarrow \mathrm{arg min}\mathcal{L}(\Theta)\;\;\;\mathrm{s.t.}\forall\theta_{i}\in\Theta:||\theta_{i}||<1.$ Poincarē embedding coordinates so that this loss function is minimized.

Network of interprotein bonds

  1. Prote-protein interaction network
    Various proteins function in the body, but many of them function by binding to each other rather than alone. A protein-protein interaction network is a network constructed by connecting proteins that bind to each other.
  2. Features of protein-protein interaction network Proteins bind to each other, but the number of binding partners is not uniform. Most proteins bind only to specific proteins. Very few proteins bind to many proteins. As a result, the protein-protein binding network has a structure in which a small number of proteins serve as a hub for the network and connect the majority of terminal proteins. The elements in the network are called nodes, the branches connecting the nodes are called edges, and the number of edges for each node is called degree. In the network of interprotein bonds, the degree is unevenly distributed. Networks whose degree distribution follows the power rule are called scale-free networks or small world networks. Scale-free networks are often observed in naturally occurring networks. The network of interprotein bonds is also known to be scale-free.
  3. Data set Get information about protein-protein binding from the database. I used NURSA protein-protein interactions dataset. Download the Gene attribute Edge List in this. This file lists the pair of proteins that bind. From this information, we build a network of binding proteins and embed it in Poincareball.

gensim Poincarē embedding

Poincarē embedding is implemented in python's natural language processing library genism, so use it. If you give a list of word pairs, it will embed a network of words in Poincareball.

# library import
import pandas as pd
import numpy as np
from gensim.models.poincare import PoincareModel
from genism.viz.poincare import poincare_2d_visualization
from plotly.offline import iplot
# Data set loading
# Download and unzip gene_attribute_edges.txt.gz.
# The pairs of proteins that bind to the source and target sections are listed.
dataset = pd.read_csv('gene_attribute_edges.txt', delimiter = '\t', usecols = [0,3], skiprows = [1])
# Create a list of bound protein pairs
relations = []
for i in range(len(dataset)):
    relations.append((dataset.source[i], dataset.target[i]))
# Apply Poincare model to protein pairs
# We will visualize it in two dimensions later, so set size = 2.
model = PoincareModel(relations, size = 2)
model.train(epochs = 1000)
# Visualization
# If all the bonds between proteins are connected by lines, it will be too complicated to understand the figure, so the number of lines is limited to 5 with num_nodes.
map = poincare_2d_visualization(model = model, tree = relations, figure_title = 'PPI network', num_nodes = 5)
iplot(map)

PPI_10epochs.png PPI_100epochs.png PPI_1000epochs.png

Each point represents a protein, and the proteins that bind to it are connected by a line. As the value of epochs is gradually increased, learning progresses, and as the loss function decreases, proteins with less binding are placed in the periphery and proteins that serve as hubs are located in the center, resulting in a hierarchical structure.

Conclusion

I used gensim's Poincarē embedding to embed a network of interprotein bonds in Poincaré balls. It was found that as learning progresses, proteins with less binding are placed in the peripheral part, while proteins that serve as hubs remain in the central part and a hierarchical structure is acquired. Poincarē embedding seems to be a good way to embed a scale-free network.

reference

Maximillian Nickel and Douwe Kiela, “Poincaré Embeddings for Learning Hierarchical Representations” genism Nuclear Receptor Signaling Atlas Malovannaya, A et al., "Analysis of the human endogenous coregulator complexome"

Recommended Posts

I tried to embed a protein-protein interaction network in hyperbolic space with Poincarē embeding of gensim
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to create a list of prime numbers with python
I tried to make a mechanism of exclusive control with Go
I tried to display the altitude value of DTM in a graph
I tried to implement a card game of playing cards in Python
I tried to integrate with Keras in TFv1.1
I tried to make a simple mail sending application with tkinter of Python
When I tried to connect with SSH, I got a warning about free space.
[Azure] I tried to create a Linux virtual machine in Azure of Microsoft Learn
I tried to create a model with the sample of Amazon SageMaker Autopilot
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
I wanted to know the number of lines in multiple files, so I tried to get it with a command
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I want to embed a variable in a Python string
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to create a table only with Django
I tried to build an environment with WSL + Ubuntu + VS Code in a Windows environment
I tried to extract features with SIFT of OpenCV
I want to transition with a button in flask
I tried to draw a route map with Python
I tried to implement a pseudo pachislot in Python
I want to work with a robot in python.
[Python] I tried to automatically create a daily report of YWT with Outlook mail
I want to use a network defined by myself in PPO2 of Stable Baselines
I tried to automatically generate a password with Python3
I tried to make creative art with AI! I programmed a novelty! (Paper: Creative Adversarial Network)
I tried to create a class to search files with Python's Glob method in VBA
I tried to automatically generate OGP of a blog made with Hugo with tcardgen made by Go
I tried to create a Python script to get the value of a cell in Microsoft Excel
I also tried to imitate the function monad and State monad with a generator in Python
I tried a convolutional neural network (CNN) with a tutorial on TensorFlow on Cloud9-Classification of handwritten images-
[Graph drawing] I tried to write a bar graph of multiple series with matplotlib and seaborn
I tried to implement a volume moving average with Quantx
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
I tried to implement a one-dimensional cellular automaton in Python
I tried to automatically create a report with Markov chain
I tried to solve a combination optimization problem with Qiskit
I tried "How to get a method decorated in Python"
I tried to get started with Hy ・ Define a class
I tried to implement ListNet of rank learning with Chainer
I tried to sort a random FizzBuzz column with bubble sort.
I tried to make a stopwatch using tkinter in python
I tried a stochastic simulation of a bingo game with Python
I tried to implement blackjack of card game in Python
I tried to divide with a deep learning language model
I tried to create an article in Wiki.js with SQLAlchemy
I tried to unlock the entrance 2 lock sesame with a single push of the AWS IoT button
When I try to divide a list with MeCab, I get'TypeError: in method'Tagger_parse', argument 2 of type'char const *''
I tried to predict the number of domestically infected people of the new corona with a mathematical model
A story that didn't work when I tried to log in with the Python requests module
[5th] I tried to make a certain authenticator-like tool with python
I tried to get a database of horse racing using Pandas
I tried to automate the watering of the planter with Raspberry Pi
[2nd] I tried to make a certain authenticator-like tool with python
A memorandum when I tried to get it automatically with selenium
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
[3rd] I tried to make a certain authenticator-like tool with python
I tried to process the image in "sketch style" with OpenCV