[PYTHON] Try deep learning of genomics with Kipoi

There are also some models that utilize deep learning in the field of genomics. For example, DeepBind, which predicts the binding ability of a base sequence to a DNA / RNA-binding protein, and DeepSEA, which predicts epigenetic modification from a base sequence.

Kipoi is a platform (referred to as a repository in the paper) that can utilize various models of deep learning of genomics. It supports models written in Keras, tensorflow, PyTorch, and Scikit-learn. You can load the trained model and easily try it out with your own array data to get the predictions.

Home page: http://kipoi.org/ Paper: https://www.nature.com/articles/s41587-019-0140-0

Actually use

The environment is Ubuntu 16.04 Anaconda Python 3.6 is.

Installation

pip install kipoi

You have now installed. The version was 0.6.25.

Error information

When I try to move it after this, I sometimes get the error ʻAttribute Error:'Table' object has no attribute'purge'`. This seems to be because tinydb is a new version. In that case, you can fix it by re-installing the old version as follows.

pip uninstall tinydb
pip install tinydb==3.13.0

Try moving Kipoi

The tutorial is published on github and you can try it out. (https://github.com/kipoi/examples) This time I will try the first part of the tutorial.

First, download the set of tutorials.

git clone https://github.com/kipoi/examples.git
cd examples

Load the kipoi model list.

kipoi ls

You can now use it. A folder (.kipoi / models /) containing models has been created in the home folder. Next, create a new virtual environment.

kipoi env create shared/envs/kipoi-py3-keras2

With this, a virtual environment called "kipoi-shared_envs_kipoi-py3-keras2" is created in conda. Go to that virtual environment.

conda activate kipoi-shared__envs__kipoi-py3-keras2

Let's try some test code.

kipoi test Basset --source=kipoi
kipoi test DeepSEA/predict --source=kipoi
kipoi test DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/ --source=kipoi

I was able to confirm that it works.

example1 Let's try the prediction of example1.

cd 1-predict

There is sample data in the input folder, so unzip it.

zcat input/hg19.chr22.fa.gz > input/hg19.chr22.fa

Make a prediction. In the DeepBind model used this time, 100 bases are input and the prediction of binding ability with CTCF is output. fasta_file contains base sequence information, intervals_file contains information on the location (start and end) of the base sequence to be retrieved.

kipoi predict DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
   --dataloader_args='{"intervals_file": "input/enhancer-regions.hg19.chr22.bed.gz",
                       "fasta_file": "input/hg19.chr22.fa"}' \
  -o preds.tsv

I was able to predict.

Let's take a look at the contents of the output (preds.tsv).

$ less preds.tsv
metadata/ranges/chr     metadata/ranges/end     metadata/ranges/id      metadata/ranges/start   metadata/ranges/strand  preds
chr22   17274293        0       17274192        *       -4.108346
chr22   17274548        1       17274447        *       -2.6903393
chr22   17275618        2       17275517        *       -5.2259955
chr22   17275863        3       17275762        *       -5.2259955
chr22   17287134        4       17287033        *       -4.2063684
chr22   17288718        5       17288617        *       -5.2259955
chr22   17570303        6       17570202        *       -4.93376
chr22   17597591        7       17597490        *       -4.4880404
chr22   17597800        8       17597699        *       -4.825454
chr22   17598104        9       17598003        *       -5.190316
・ ・ ・ ・ ・ ・ ・

The rightmost column is the predicted value.

Snakefile is used for the continuation of the tutorial, and it seems that you need to install snakemake to make it work. (snakemake is a useful workflow automation tool. See here)

How to run in Python

So far we've done everything with the CLI, but there are also APIs for Python and R. After entering the virtual environment, it seems to execute as follows in Python. (Excerpt from Manual)

import kipoi

kipoi.list_models() # list available models
model = kipoi.get_model("Basset") # load the model
model = kipoi.get_model(  # load the model from a past commit
    "https://github.com/kipoi/models/tree/<commit>/<model>",
    source='github-permalink'
)

# main attributes
model.model # wrapped model (say keras.models.Model)
model.default_dataloader # dataloader
model.info # description, authors, paper link, ...

# main methods
model.predict_on_batch(x) # implemented by all the models regardless of the framework
model.pipeline.predict(dict(fasta_file="hg19.fa", intervals_file="intervals.bed"))
# runs: raw files -[dataloader]-> numpy arrays -[model]-> predictions 

Impressions etc.

I found it very interesting as a platform for genomics analysis. The virtual environment created with kipoi is treated in the same way as the virtual environment of Anaconda. Not only can you download and use the model, but you can also upload and share your own trained model. I want to continue using it.

Recommended Posts

Try deep learning of genomics with Kipoi
Try deep learning with TensorFlow
Try Deep Learning with FPGA
Try Deep Learning with FPGA-Select Cucumbers
Try deep learning with TensorFlow Part 2
Try Bitcoin Price Forecasting with Deep Learning
Try with Chainer Deep Q Learning --Launch
Sentiment analysis of tweets with deep learning
Deep learning 1 Practice of deep learning
The story of doing deep learning with TPU
Deep Kernel Learning with Pyro
Try machine learning with Kaggle
Deep reinforcement learning 2 Implementation of reinforcement learning
Generate Pokemon with Deep Learning
Try to build a deep learning / neural network with scratch
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Make ASCII art with deep learning
Solve three-dimensional PDEs with deep learning.
Try machine learning with scikit-learn SVM
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Try Common Representation Learning with chainer
Make people smile with Deep Learning
I searched for a similar card of Hearthstone with Deep Learning
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Deep Learning
[Anomaly detection] Try using the latest method of deep distance learning
[Examples of improving Python] Learning Python with Codecademy
Classify anime faces with deep learning with Chainer
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Meaning of deep learning models and parameters
Deep learning / Deep learning from scratch 2-Try moving GRU
Visualize the effects of deep learning / regularization
Learning record of reading "Deep Learning from scratch"
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
See the behavior of drunkenness with reinforcement learning
Deep learning / error back propagation of sigmoid function
A memorandum of studying and implementing deep learning
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Start Deep learning
Basic understanding of stereo depth estimation (Deep Learning)
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
99.78% accuracy with deep learning by recognizing handwritten hiragana
Python Deep Learning
Parallel learning of deep learning by Keras and Kubernetes
Deep learning × Python
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Try using Jupyter Notebook of Azure Machine Learning
Key points of "Machine learning with Azure ML Studio"
A story about predicting exchange rates with Deep Learning
Count the number of parameters in the deep learning model
Deep learning image analysis starting with Kaggle and Keras
Try projective transformation of images using OpenCV with Python
Try to predict forex (FX) with non-deep machine learning
Application of Deep Learning 2 made from scratch Spam filter
Techniques for understanding the basis of deep learning decisions
Predict the gender of Twitter users with machine learning