Biopython Tutorial und Kochbuch Japanische Übersetzung (4.8)

4.8 Adding SeqRecord objects Bis 4.7

You can add SeqRecord objects together, giving a new SeqRecord. What is important here is that any common per-letter annotations are also added, all the features are preserved (with their locations adjusted), and any other common annotation is also kept (like the id, name and description). ** SeqRecords können addiert werden und ein neuer SeqRecord wird zurückgegeben. Das Wichtigste ist, auch die Annotationen pro Brief zu kombinieren. Alle Funktionen stehen noch aus, und andere Annotationen stehen ebenfalls aus. (ID, Name, Beschreibung usw.) **

For an example with per-letter annotation, we’ll use the first record in a FASTQ file. Chapter 5 will explain the SeqIO functions: ** Verwenden Sie die erste Aufzeichnung einer FASTQ-Datei, wie im Beispiel für Annotationen pro Buchstabe beschrieben. Kapitel 5 beschreibt SeqIO. ** ** **

>>> from Bio import SeqIO
>>> record = next(SeqIO.parse("example.fastq", "fastq"))
>>> len(record)
25
>>> print(record.seq)
CCCTTCTTGTCTTCAGCGTTTCTCC

>>> print(record.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 26, 23, 23]

Let’s suppose this was Roche 454 data, and that from other information you think the TTT should be only TT. We can make a new edited record by first slicing the SeqRecord before and after the “extra” third T: ** Angenommen, dies sind Daten von Roche 454, andere Quellen haben gezeigt, dass TTT TT sein sollte. Sie können einen neuen Datensatz erstellen, indem Sie den SeqRecord vor und nach dem dritten T in Scheiben schneiden. ** ** **

>>> left = record[:20]
>>> print(left.seq)
CCCTTCTTGTCTTCAGCGTT
>>> print(left.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26]
>>> right = record[21:]
>>> print(right.seq)
CTCC
>>> print(right.letter_annotations["phred_quality"])
[26, 26, 23, 23]

Now add the two parts together: ** Kombinieren Sie die beiden in Scheiben geschnittenen Teile. ** ** **

>>> edited = left + right
>>> len(edited)
24
>>> print(edited.seq)
CCCTTCTTGTCTTCAGCGTTCTCC

>>> print(edited.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 23, 23]

Easy and intuitive? We hope so! You can make this shorter with just: ** Einfach und intuitiv, oder? Es kann wie folgt weiter vereinfacht werden: **

>>> edited = record[:20] + record[21:]

Now, for an example with features, we’ll use a GenBank file. Suppose you have a circular genome: ** Ich werde ein Beispiel für Funktionen anhand einer GenBank-Datei erläutern. Angenommen, es ist ein zirkuläres Genom. ** ** **

>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")

>>> record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=['Project:10638'])

>>> len(record)
9609
>>> len(record.features)
41
>>> record.dbxrefs
['Project:58037']

>>> record.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

You can shift the origin like this: ** Sie können den Startpunkt wie folgt ändern. ** ** **

>>> shifted = record[2000:] + record[:2000]

>>> shifted
SeqRecord(seq=Seq('GATACGCAGTCATATTTTTTACACAATTCTCTAATCCCGACAAGGTCGTAGGTC...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=[])

>>> len(shifted)
9609

Note that this isn’t perfect in that some annotation like the database cross references and one of the features (the source feature) have been lost: ** Hinweis: Diese Methode ist nicht perfekt. Sie verlieren DBXRefs und Quellfunktionen. ** ** **

>>> len(shifted.features)
40
>>> shifted.dbxrefs
[]
>>> shifted.annotations.keys()
[]

This is because the SeqRecord slicing step is cautious in what annotation it preserves (erroneously propagating annotation can cause major problems). **If you want to keep the database cross references or the annotations dictionary, this must be done explicitly: Der Grund für den Verlust besteht darin, beim Schneiden von SeqRecord vorsichtig mit dem Halten von Anmerkungen umzugehen (falsche Anmerkungen können sehr problematisch sein). Wenn Sie die Wörterbücher dbxrefs und annotations reservieren möchten, müssen Sie angeben. ** ** **

>>> shifted.dbxrefs = record.dbxrefs[:]
>>> shifted.annotations = record.annotations.copy()
>>> shifted.dbxrefs
['Project:10638']
>>> shifted.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

Also note that in an example like this, you should probably change the record identifiers since the NCBI references refer to the original unmodified sequence. ** Hinweis: In einem solchen Beispiel sollten auch die Kennungen des Rekorders angepasst werden. (Da sich NCBI-Referenzen auf das ursprüngliche Array beziehen, das sich nicht geändert hat) **