[PYTHON] Convert genbank file to gff file

Search for "convert gbff to gff" Question: Converting Gbff To Gff3 People who are at a loss when they see the content of such discussions

For those who want to know tools that can convert life science data in various formats

Background (it is okay to skip it)

Click to expand

Obtaining genomic data from NCBI --fasta file (.fasta) --genbank file (.gbff) Can be obtained. Base sequence information (.fasta) + annotation file that adds gene information to the sequence (**. Gbff **) Makes it possible to decipher the genome.

However, if you want to use this data as a reference genome in IGV etc. IGV will not read the information unless the annotation file is **. Gff ** (gff3) or **. Gtf ** (gff2). (The explanation of the file format is omitted)

Both gbff (GenBank Flat File) and gff (General Feature Format) In short, since it is an annotation file, can it be converted? When I searched for "convert gbff to gff", I found a record of such discussions in the past, but There is no specific solution.

After researching various things, I managed to find a conversion method, so I will introduce it here.

The key to the solution was in the discussion at biostars introduced at the beginning. A mysterious script called "bp_genbank2gff3.pl". It seems that it can be used with bioperl, but it seems that there is a bug from the conversation content. Maybe there is a tool similar to python? I found it when I looked it up.

Q. If you use ensembl, you can get the annotation file as a gff (gtf) file, so why not have such a hard time in the first place? A. I only had data on ncbi, probably because I wanted to use a minor creature ...

Tool installation

usage environment

Python module to install

** 1. Install bioconvert **

pip install bioconvert

This method installs Bioconvert and its Python dependencies. Note, however, that bioconvert may use (depending on the conversion you want to use) external dependencies not available on Pypi. You will need to install those third-party dependencies yourself. An alternative is to install bioconvert using conda as explained here after. https://bioconvert.readthedocs.io/en/master/installation.html

When installing with pip, it solves the dependency of the python module managed by PyPI, but it seems that it does not solve the dependency of the third party package. ** In short, installing with pip limits functionality. ** ** It seems that using conda will also solve the dependencies in that area, Ignore this time as it is only necessary to convert & gbff → gff3 without using conda.

** 2. If the installation fails in the middle (probably mappy installation failed), also install the python3-devel package **

yum install python3-devel

** 3. Install biocode **

pip install biocode
bioconvert --help

As mentioned above, the functionality is limited, and some methods are disabled warnings are displayed.

WARNING [bioconvert.core.base]:  converter 'FASTQ2FASTA': method seqtk is not available
WARNING [bioconvert.core.base]:  converter 'GENBANK2EMBL': method squizz is not available
WARNING [bioconvert.core.base]:  converter 'GENBANK2FASTA': method squizz is not available
WARNING [bioconvert.core.base]:  converter 'GZ2BZ2': method pigz_pbzip2 is not available
WARNING [bioconvert.core.base]:  converter 'GZ2DSRC': method pigzdsrc is not available
genbank2gff3        genbank to-> gff3 (1 methods)

No warning is displayed for genbank2gff3 that you want to use this time, so you can rest assured. ~~ Be prepared for the log to get dirty every time you run the script ~~

Run

bioconvert genbank2gff3 foo.gbff foo.gff3

Can be converted from foo.gbff to foo.gff3

During conversion

WARNING: The following feature was skipped:
type: assembly_gap
location: [96782:96838](+)
qualifiers:
    Key: estimated_length, Value: ['56']
    Key: gap_type, Value: ['within scaffold']
    Key: linkage_evidence, Value: ['paired-ends']

Information that gff3 does not support, such as, is not carried over to the gff3 file.

Other file formats that can be converted

See bioconvert readme https://github.com/bioconvert/bioconvert

Recommended Posts

Convert genbank file to gff file
Convert HTML to text file
[Python] How to convert db file to csv
Convert to HSV
How to convert Python to an exe file
Convert matplotlib graphs to emf file format
Convert GFF3 to MSS for DDBJ submission
Convert psd file to png in Python
Convert svg file to png / ico with Python
Convert 202003 to 2020-03 with pandas
Convert kanji to kana
Convert jupyter to py
Convert keras-yolo3 to onnx
Convert dict to array
Convert json to excel
Convert a text file with hexadecimal values to a binary file
How to convert JSON file to CSV file with Python Pandas
How to convert a mel spectrogram back to a wav file
How to convert Json file to CSV format or EXCEL format
Convert Pascal VOC format xml file to COCO format json file
Convert hexadecimal string to binary
[python] Convert date to string
Convert numpy int64 to python int
[Python] Convert list to Pandas [Pandas]
Upload a file to Dropbox
Convert Scratch project to Python
[Python] Convert Shift_JIS to UTF-8
Convert IP address to decimal
Convert SDF to CSV quickly
[Caffe] Convert mean file from binary proto format to npy format
Convert python 3.x code to python 2.x
Convert Excel file to text in Python for diff purposes
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
I tried to convert a Python file to EXE (Recursion error supported)
After calling the Shell file on Python, convert CSV to Parquet.
Automatically convert to py file when ui file is updated in PySide
Convert Tweepy Status object to JSON
Convert .ipynb to .html (with BatchFile)
Function to convert Excel column to number
Convert PDF to Documents by OCR
[Python] Write to csv file with Python
Convert A4 PDF to A3 every 2 pages
Convert wma to mp3 on Mac
Output to csv file with Python
Convert some Japanese names to antonyms
Write standard output to a file
Workflow to convert formula (image) to python
Convert list to DataFrame with python
Output cell to file with Colaboratory
Convert sentences to vectors with gensim
How to convert 0.5 to 1056964608 in one shot
Python> list> Convert double list to single list
Convert from pdf to txt 2 [pyocr]
How to convert Tensorflow model to Lite
Convert a string to an image
[Python] Convert natural numbers to ordinal numbers
Convert decimal numbers to n-ary numbers [python]
Program to convert Japanese to station name
How to convert from .mgz to .nii.gz
Convert PDF to image with ImageMagick
How to create a config file