Biopython Tutorial and Cookbook Japanese translation (4.8)

4.8 Adding SeqRecord objects To 4.7

You can add SeqRecord objects together, giving a new SeqRecord. What is important here is that any common per-letter annotations are also added, all the features are preserved (with their locations adjusted), and any other common annotation is also kept (like the id, name and description). ** Addition of SeqRecords is possible and a new SeqRecord is returned. The important thing is to combine the per-letter annotations as well, all features are pending, and other annotations are also pending. (Id, name, description, etc.) **

For an example with per-letter annotation, we’ll use the first record in a FASTQ file. Chapter 5 will explain the SeqIO functions: ** Use the first recording of a FASTQ file, as described in the per-letter annotation example. Chapter 5 describes SeqIO. ** **

>>> from Bio import SeqIO
>>> record = next(SeqIO.parse("example.fastq", "fastq"))
>>> len(record)
25
>>> print(record.seq)
CCCTTCTTGTCTTCAGCGTTTCTCC

>>> print(record.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 26, 23, 23]

Let’s suppose this was Roche 454 data, and that from other information you think the TTT should be only TT. We can make a new edited record by first slicing the SeqRecord before and after the “extra” third T: ** Suppose this is data from Roche 454, other sources have shown that TTT should be TT. You can create a new record by slicing the SeqRecord before and after the third T. ** **

>>> left = record[:20]
>>> print(left.seq)
CCCTTCTTGTCTTCAGCGTT
>>> print(left.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26]
>>> right = record[21:]
>>> print(right.seq)
CTCC
>>> print(right.letter_annotations["phred_quality"])
[26, 26, 23, 23]

Now add the two parts together: ** Combine the two sliced pieces. ** **

>>> edited = left + right
>>> len(edited)
24
>>> print(edited.seq)
CCCTTCTTGTCTTCAGCGTTCTCC

>>> print(edited.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 23, 23]

Easy and intuitive? We hope so! You can make this shorter with just: ** Easy and intuitive, right? It can be further simplified as follows: **

>>> edited = record[:20] + record[21:]

Now, for an example with features, we’ll use a GenBank file. Suppose you have a circular genome: ** I will explain an example of features using a GenBank file. Suppose it is a circular genome. ** **

>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")

>>> record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=['Project:10638'])

>>> len(record)
9609
>>> len(record.features)
41
>>> record.dbxrefs
['Project:58037']

>>> record.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

You can shift the origin like this: ** You can change the starting point as follows. ** **

>>> shifted = record[2000:] + record[:2000]

>>> shifted
SeqRecord(seq=Seq('GATACGCAGTCATATTTTTTACACAATTCTCTAATCCCGACAAGGTCGTAGGTC...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=[])

>>> len(shifted)
9609

Note that this isn’t perfect in that some annotation like the database cross references and one of the features (the source feature) have been lost: ** Note: This approach is not perfect, you will lose dbxrefs and source features. ** **

>>> len(shifted.features)
40
>>> shifted.dbxrefs
[]
>>> shifted.annotations.keys()
[]

This is because the SeqRecord slicing step is cautious in what annotation it preserves (erroneously propagating annotation can cause major problems). **If you want to keep the database cross references or the annotations dictionary, this must be done explicitly: The reason for the loss is to be cautious about holding annotations when slicing SeqRecords (wrong annotations can be very problematic). If you want to reserve the dbxrefs and annotations dictionaries, you must specify them. ** **

>>> shifted.dbxrefs = record.dbxrefs[:]
>>> shifted.annotations = record.annotations.copy()
>>> shifted.dbxrefs
['Project:10638']
>>> shifted.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

Also note that in an example like this, you should probably change the record identifiers since the NCBI references refer to the original unmodified sequence. ** note: In such an example, the record identifiers should also be adjusted. (Because NCBI references refer to the original array that hasn't changed) **

Recommended Posts

Biopython Tutorial and Cookbook Japanese translation (4.3)
Biopython Tutorial and Cookbook Japanese translation (4.1)
Biopython Tutorial and Cookbook Japanese translation (4.5)
Biopython Tutorial and Cookbook Japanese translation (4.8)
Biopython Tutorial and Cookbook Japanese translation (4.7)
Biopython Tutorial and Cookbook Japanese translation (4.9)
Biopython Tutorial and Cookbook Japanese translation (4.6)
Biopython Tutorial and Cookbook Japanese translation (4.2)
Biopython Tutorial and Cookbook Japanese translation (4.4)
Biopython Tutorial and Cookbook Japanese translation (Chapter 1, 2)
streamlit tutorial Japanese translation
sosreport Japanese translation
[Translation] hyperopt tutorial
man systemd Japanese translation
streamlit explanation Japanese translation
man systemd.service Japanese translation
man nftables Japanese translation
Dockerfile reference Japanese translation
docker-compose --help Japanese translation
docker help Japanese translation
SymPy tutorial Japanese notebook
[PyTorch] Tutorial (Japanese version) ② ~ AUTOGRAD ~
docker build --help Japanese translation
Chainer, RNN and machine translation
Japanese translation of sysstat manual
[PyTorch] Tutorial (Japanese version) ① ~ Tensor ~
Japanese translation of Linux manual
docker run --help Japanese translation
Pandas User Guide "Table Formatting and PivotTables" (Official Document Japanese Translation)