Tutoriel Biopython et traduction japonaise du livre de recettes (4.8)

4.8 Adding SeqRecord objects Vers 4.7

You can add SeqRecord objects together, giving a new SeqRecord. What is important here is that any common per-letter annotations are also added, all the features are preserved (with their locations adjusted), and any other common annotation is also kept (like the id, name and description). ** Les SeqRecords peuvent être ajoutés ensemble et un nouveau SeqRecord est renvoyé. L'important est de combiner également les annotations par lettre, toutes les fonctionnalités sont en attente et d'autres annotations sont également en attente. (Id, nom, description, etc.) **

For an example with per-letter annotation, we’ll use the first record in a FASTQ file. Chapter 5 will explain the SeqIO functions: ** Utilisez le premier enregistrement d'un fichier FASTQ, comme décrit dans l'exemple d'annotation par lettre. Le chapitre 5 décrit SeqIO. ** **

>>> from Bio import SeqIO
>>> record = next(SeqIO.parse("example.fastq", "fastq"))
>>> len(record)
25
>>> print(record.seq)
CCCTTCTTGTCTTCAGCGTTTCTCC

>>> print(record.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 26, 23, 23]

Let’s suppose this was Roche 454 data, and that from other information you think the TTT should be only TT. We can make a new edited record by first slicing the SeqRecord before and after the “extra” third T: ** Supposons que ce soit des données de Roche 454, d'autres sources ont montré que TTT devrait être TT. Vous pouvez créer un nouvel enregistrement en découpant le SeqRecord avant et après le troisième T. ** **

>>> left = record[:20]
>>> print(left.seq)
CCCTTCTTGTCTTCAGCGTT
>>> print(left.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26]
>>> right = record[21:]
>>> print(right.seq)
CTCC
>>> print(right.letter_annotations["phred_quality"])
[26, 26, 23, 23]

Now add the two parts together: ** Combinez les deux parties tranchées. ** **

>>> edited = left + right
>>> len(edited)
24
>>> print(edited.seq)
CCCTTCTTGTCTTCAGCGTTCTCC

>>> print(edited.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 23, 23]

Easy and intuitive? We hope so! You can make this shorter with just: ** Facile et intuitif, non? Il peut être encore simplifié comme suit: **

>>> edited = record[:20] + record[21:]

Now, for an example with features, we’ll use a GenBank file. Suppose you have a circular genome: ** Je vais vous expliquer un exemple de fonctionnalités utilisant un fichier GenBank. Supposons que ce soit un génome circulaire. ** **

>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")

>>> record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=['Project:10638'])

>>> len(record)
9609
>>> len(record.features)
41
>>> record.dbxrefs
['Project:58037']

>>> record.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

You can shift the origin like this: ** Vous pouvez modifier le point de départ comme suit. ** **

>>> shifted = record[2000:] + record[:2000]

>>> shifted
SeqRecord(seq=Seq('GATACGCAGTCATATTTTTTACACAATTCTCTAATCCCGACAAGGTCGTAGGTC...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=[])

>>> len(shifted)
9609

Note that this isn’t perfect in that some annotation like the database cross references and one of the features (the source feature) have been lost: ** Remarque: cette approche n'est pas parfaite, vous perdrez les dbxrefs et les fonctionnalités sources. ** **

>>> len(shifted.features)
40
>>> shifted.dbxrefs
[]
>>> shifted.annotations.keys()
[]

This is because the SeqRecord slicing step is cautious in what annotation it preserves (erroneously propagating annotation can cause major problems). **If you want to keep the database cross references or the annotations dictionary, this must be done explicitly: La raison de la perte est d'être prudent quant à la conservation des annotations lors du découpage de SeqRecord (les mauvaises annotations peuvent être très problématiques). Si vous souhaitez réserver les dictionnaires dbxrefs et d'annotations, vous devez spécifier. ** **

>>> shifted.dbxrefs = record.dbxrefs[:]
>>> shifted.annotations = record.annotations.copy()
>>> shifted.dbxrefs
['Project:10638']
>>> shifted.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

Also note that in an example like this, you should probably change the record identifiers since the NCBI references refer to the original unmodified sequence. ** Remarque: Dans un tel exemple, les identifiants d'enregistrement doivent également être ajustés. (Parce que les références NCBI font référence au tableau d'origine qui n'a pas changé) **