In some databases, only the ID is described in fasta, and the Description corresponding to that ID may be in a separate file.
This can be useful AHRD (https://github.com/groupschoof/AHRD/blob/master/README.textile) Because it is not suitable for optimizing annotations using I wrote the following script that fuses the two files.
fasta_annotation_merge.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#fasta_Change annotation information with id item as key
import sys
from Bio import SeqIO
import csv
fasta_in = sys.argv[1] #In the first argument, specify the fasta file you want to change.
changer_in = sys.argv[2] #Tab-separated keyID in the second argument\t Specify the file that describes Annotation
for q in open(changer_in, "r"): #Open annotation information file
change = q.split('\t') #Put in the change list separated by tabs
for record in SeqIO.parse(fasta_in, 'fasta'): #Open fasta file Parse using SeqIO(Read one item at a time)
id_part = record.id #Read the ID part of fasta
seq = record.seq #Read the array part of fastan
if id_part == change[0]: #If the id part of the fasta file and the id part of the changer item match. ..
fasta_seq = '>' + id_part + ' '+ change[1] + '\n' + seq + '\n' #Arrange in fasta format
print(fasta_seq) #Output fasta to standard output
Recommended Posts