[PYTHON] Super easy molecular phylogenetic tree creation technique that I do not want to teach anyone

Phylogenetic tree estimation

Introducing an ultra-simple molecular phylogenetic tree creation method using the ETE Toolkit.

NUP62.aa.fa.final_tree.png (Figure: from http://etetoolkit.org/documentation/ete-build/)

Tools to use

--BLAST (Sequence similarity search) --MAFFT (Alignment) https://mafft.cbrc.jp/alignment/server/ --trimAl (correction of alignment for phylogenetic tree construction), optional http://trimal.cgenomics.org/ --RAxML (Phylogenetic tree calculation) https://sco.h-its.org/exelixis/web/software/raxml/index.html -FigTree or iTOL (Phylogenetic tree drawing)

Rough flow

  1. The 16S rRNA sequence (?) Of the target species for which you want to see the phylogenetic relationship with related species is searched for sequence similarity by BLAST in a sequence database (16S rRNA, RefSeq, nr, etc.), and the target is based on the results. Fetch the sequence of other species such as the genus to which the species belongs. * Although it is targeted for bacterial 16S without permission, please use your own sequence.
  2. Fetch the 16S rRNA sequence of a non-genus (to make it an outgroup of the phylogenetic tree).
  3. Collect all the sequences in ↑ 1.2. Into a FASTA file and perform multiple alignment with MAFFT L-INS-i.
  4. Adjust the alignment result for phylogenetic tree construction using trimAl (remove unsuitable areas such as gaps when calculating the phylogenetic tree). This stage is optional and does not have to be done.
  5. Construct a maximum likelihood phylogenetic tree using RAxML. It also calculates the bootstrap value.
  6. Load the constructed phylogenetic tree into FigTree and draw it. Or draw with iTOL (only web-based is more beautiful).
  7. Fun (✿╹◡╹) v

keyword

Types of phylogenetic trees: unrooted tree, rooted tree, Phylogenetic tree calculation method: NJ method, maximum likelihood method, Bayes method

Back route

How to build a phylogenetic tree in an instant using the ETE Toolkit. It was so much fun that I was really scared. I really wanted to make it a paid information product, but to be honest, I was really worried. And I realized that I was thinking seriously and seriously. I would like to publish it specially this time.

It is assumed that Python and Anaconda are included. If not included / suspicious → macOS: Notes from installing Homebrew to building an Anaconda environment for Python with pyenv Linux: Note on building Python's Anaconda environment with pyenv in Linux environment

Environment

$ pip install ete3 #Install ETE toolkit
$ conda install -c etetoolkit ete_toolchain #Installation of necessary tools

Phylogenetic tree construction

The grammar is

$  ete3 build -w Workflow name-n Input array file(Before alignment) -o Output directory name--clearall

Example:

$  ete3 build -w mafft_linsi-none-none-raxml_default -n input.fasta -o output_tree --clearall

Only this! This alone will do all the multiple alignment of the array, trimming of many gap areas, and system estimation! !!

The syntax of this workflow is ʻaligner-trimmer-tester-builderseparated by hyphens. For example, the workflowmafft_linsi-none-none-raxml_default Then, align with mafft's L-INS-i algorithm and build a phylogenetic tree with RAxML .. If Paisen of the lab uses clustal for alignment! If you sayclustalo_default-none-none-raxml_default, it will be aligned using clustal omega (successor to clustal w). If you want to include the process of trimming the gap area of the array using trimAl, You can do it with mafft_linsi-trimal01-none-raxml_default. If you want to include the calculation of the bootstrap value when estimating the phylogenetic tree Let's say mafft_linsi-none-none-raxml_default_bootstrap` (it will take some time).

-Maybe it's safe to say mafft_linsi-none-none-raxml_default_bootstrap (sorry, sorry)

The tools that can be used

$ ete3 build apps

If so, it will be displayed. Reference: Composing custom workflows

Result file

When you execute the phylogenetic tree construction command in the above example, It will generate various result files in ./output_tree/mafft_linsi-none-none-raxml_default. There are various files in it, ・ Input.fasta.final_tree.png → A chara diagram that shows the phylogenetic tree and the schematic diagram of the aligned array together. ・ Input.fasta.final_tree.fa → Aligned fasta file ・ Input.fasta.final_tree.nw → Estimated phylogenetic tree file. If you load this into FigTree or iTOL, you can get a beautiful figure.

Reference: The ETE Cookbook

Recommended Posts

Super easy molecular phylogenetic tree creation technique that I do not want to teach anyone
I want to do ○○ with Pandas
"CSI" that I want to teach beginners in interactive console application production
[Linux] You do not have root privileges. But I want to yum install.
Library for "I want to do that" of data science on Jupyter Notebook
I want to do Dunnett's test in Python
I want to do pyenv + pipenv on Windows
Do you want me to fix that copy?
The story of Linux that I want to teach myself half a year ago
[Pyhton] I want to solve the problem that tkinter does not work on MacOS11
I want to save a file with "Do not compress images in file" set in OpenPyXL
Convenient Linux keyboard operation that I want to teach myself when I was in school