[PYTHON] Try using scanpy's data integration function (sc.tl.ingest)

What is scanpy

scanpy is a tool to analyze scRNA-seq data with python. Many people may use R's seurat, but I think there are a certain number of people who want to analyze scRNAseq data with python. It is a tool that answers such people. Recently ~~ (about half a year ago) ~~ I have a tutorial to integrate data into the scanpy. (As of 2020/04/18)

Data integration implemented in Scanpy

As already mentioned above, since version 1.4.5 of scanpy, a function called sc.tl.ingest has been implemented to integrate the newly acquired data with the reference data. There is already a tutorial. (Integrating data using ingest and BBKNN: https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html)

Wow! I want to use it!

I think there are many people who say that. However, when I try to install using conda as below

.sh


$ conda install -c bioconda scanpy
Collecting package metadata (repodata.json): done

(Omission)

The following NEW packages will be INSTALLED:
  scanpy             bioconda/noarch::scanpy-1.4.3-py_0

Will be displayed. If you install it as it is, scanpy-1.4.3 will be installed and you will not be able to use the data integration functions (even though there is a tutorial!).

I think that the bioconda of conda will be updated to a version higher than scanpy 1.4.5, but it's a big deal, so let's use it first.

pip install the latest version of scanpy

[Caution!] ** Do not mix conda and pip! There is a claim **. (Although some people argue that it's okay to mix them ...) Below, please be at your own risk.

scanpy home page (https://scanpy.readthedocs.io/en/latest) Looking at, the latest version seems to be 1.4.6. So let's pip install scanpy-1.4.6.

.sh


$ pip install scanpy=="1.4.6"
Collecting scanpy==1.4.6

(Omission)

Successfully installed anndata-0.7.1 h5py-2.10.0 matplotlib-3.2.1 scanpy-1.4.6

The pip install worked fine.

Check version

However, it is still early to be relieved. Just in case, let's check if 1.4.6 is really installed from python.

import scanpy as sc
sc.logging.print_versions()
>scanpy==1.4.6 anndata==0.7.1 umap==0.3.10 numpy==1.17.4 scipy==1.4.1  pandas==1.0.3 scikit-learn==0.22 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1

It seems that scanpy version 1.4.6 is installed successfully.

However, you may get an error when you actually use the function you want to integrate. Tutorial (https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html) We will proceed along.

(Omission)
sc.tl.ingest(adata, adata_ref, obs='leiden')
>running ingest
    finished (0:00:06)

The data integration function also worked! (Although not mentioned in this article, it can also be integrated using the bbknn method.)

After integrating using this function, if you visualize it, you can see how the reference and the batch effect of the new data overlap. image.png Quote: https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html

Finally

When I analyze, I always think that bioinformatics is a field where various tools are coming out and if you use them, you will get results like that for the time being. This time, I have not verified the certainty of integration, but I would like to learn more about that. If you have any mistakes or advice, it would be greatly appreciated if you could give us guidance.

References

1,Seurat:https://satijalab.org/seurat/ 2,Scanpy:https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html 3, About package management with conda and pip: https://qiita.com/ynakayama/items/29efebeb38604d10acef

Recommended Posts

Try using scanpy's data integration function (sc.tl.ingest)
Try function optimization using Hyperopt
Try using django-import-export to add csv data to django
Try using docker-py
Try using cookiecutter
Try using PDFMiner
Try using geopandas
Try using Selenium
Try using scipy
Try using pandas.DataFrame
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
Try using COVID-19's open data from Yokohama / Tokyo / Osaka
Data analysis using xarray
Data analysis using Python 0
Function fitting (using Keras)
Try using virtualenv (virtualenvwrapper)
Data cleansing 2 Data cleansing using DataFrame
Data cleaning using Python
[Azure] Try using Azure Functions
Try using virtualenv now
Try using W & B
Try using Django templates.html
[Kaggle] Try using LGBM
Try using Python's feedparser.
Try using Python's Tkinter
Try using Tweepy [Python2.7]
Try using Pytorch's collate_fn
Try accessing AWS Redshift data using Oracle Cloud Infrastructure Data Science
Try running a function written in Python using Fn Project
Try encryption / decryption using OpenSSL key with Python3 pow function