--Learning of biology --Study of Biopython --Review of English ――I want to contribute to this field as much as possible
Both are completely beginners, so please point out any parts that do not make sense. I will translate at the pace of one verse in two days. English text is also included to make it easier to find mistakes. After finishing one chapter, I will put it together in one article.
Chapter 4 Sequence annotation objects Chapter 3 introduced the sequence classes. ** chp3 introduced the sequence class. ** **
Immediately “above” the Seq class is the Sequence Record or SeqRecord class, defined in the Bio.SeqRecord module. ** Directly above the Seq class (meaning inheritance?) Is defined in the Sequence Record, or SeqRecord class, Bio.SeqRecord module. ** **
This class allows higher level features such as identifiers and features (as SeqFeature objects) to be associated with the sequence, and is used throughout the sequence input/output interface Bio.SeqIO described fully in Chapter 5. ** This class associates high-level features like identifiers and features (SeqFeature objects) with sequences, and is widely used in the sequence input / output interface-Bio.SeqIO introduced in chapter 5. ** **
If you are only going to be working with simple data like FASTA files, you can probably (probably) skip this chapter for now. ** If you only deal with simple data like FASTA, you can probably skip this chapter. ** **
While this chapter should cover most things to do with the SeqRecord and SeqFeature objects in this chapter, you may also want to read the SeqRecord wiki page (http://biopython.org/wiki/SeqRecord), and the built in documentation (also online – SeqRecord and SeqFeature): ** This chapter needs to do a lot with SeqRecord and SeqFeature objects, You may also want to refer to the SeqRecord wiki page and function documentation. ** **
>>> from Bio.SeqRecord import SeqRecord
>>> help(SeqRecord)
4.1 The SeqRecord object
The SeqRecord (Sequence Record) class is defined in the Bio.SeqRecord module. ** The SeqRecord (Sequence Record) class is defined within the Bio.SeqRecord module. ** **
This class allows higher level features such as identifiers and features to be associated with a sequence (see Chapter 3), and is the basic data type for the Bio.SeqIO sequence input/output interface (see Chapter 5). ** This class provides the ability to add high-level features and modifiers to sequences. And it becomes the basic data of the Bio.SeqIO sequence input / output interface (see chapter 5) **
The SeqRecord class itself is quite simple, and offers the following information as attributes: ** The SeqRecord class itself is very simple and uses the following information as attributes. ** **
.seq – The sequence itself, typically a Seq object. .id – The primary ID used to identify the sequence – a string. In most cases this is something like an accession number. .name – A “common” name/id for the sequence – a string. In some cases this will be the same as the accession number, but it could also be a clone name. I think of this as being analogous to the LOCUS id in a GenBank record. .description – A human readable description or expressive name for the sequence – a string. .letter_annotations – Holds per-letter-annotations using a (restricted) dictionary of additional information about the letters in the sequence. The keys are the name of the information, and the information is contained in the value as a Python sequence (i.e. a list, tuple or string) with the same length as the sequence itself. This is often used for quality scores (e.g. Section 20.1.6) or secondary structure information (e.g. from Stockholm/PFAM alignment files). .annotations – A dictionary of additional information about the sequence. The keys are the name of the information, and the information is contained in the value. This allows the addition of more “unstructured” information to the sequence. .features – A list of SeqFeature objects with more structured information about the features on a sequence (e.g. position of genes on a genome, or domains on a protein sequence). The structure of sequence features is described below in Section 4.3. .dbxrefs - A list of database cross-references as strings.
**.seq – Sequence itself, generally a Seq object. .id – ID to identify the sequence – String. Often the accession number. .name – Commonly known as sequence – String. Sometimes it is an accession number or clone name (a clone whose'Clone name'starts with'IR' is a full-length clone). .description – Names that represent human-readable information or sequence meaning – Strings. .letter_annotations – Preserve character-by-character annotations by utilizing additional information (dictionary type) about the characters in the (restricted) sequence. The key is the name of the information and the information is included in the value (ie list, tuple or String) It will be the same length as the sequence. Often used for quality scores (eg Section 20.1.6) or secondary structure information (eg from Stockholm / PFAM alignment files). .annotations – Sequence additional information (dictionary type). The key is the name of the information, the value is the information. You can add more additional information to the sequence. .features – Sequence feature data that is a list of SeqFeature objects and contains more structured information (e.g. Gene location information on the genome, or domain of protein sequences). Section 4.3 refers to the features of structured sequences. .dbxrefs – Database cross-reference list (string). **
Recommended Posts