(2017/2/22, CentOS x86_64)

Introduction

OrthoFinder was used to perform Orthologous analysis based on the genomic information of multiple species. OrthoFinder uses MCL (markov cluster algorithm) to estimate orthologs. According to the paper, OrthoFinder is faster than other methods (such as OrthoMCL) in benchmarking tests using OrthoBench, and it is also an excellent method that has been refined by its own standardization for classification of orthologs. I will.

reference

http://www.stevekellylab.com/software/orthofinder https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531804/

The idea of OrthoFinder

Orthologs are nowadays understood by people in various definitions, but in OrthoFinder,

"Many-to-many genes from the Last Common Ancestor (LCA), including those with gene duplication" * It is defined as Furthermore, we have defined an Orthologous group with a unique idea called ** "OrthoGroup" **, which is an extension of it to multiple types. The authors mention that this OrthoGroup includes not only orthologs but also paralogs, which is an incomplete definition, but concludes that it should not be a major obstacle for general Orthologous analysis applications. I will. In any case, you need to be careful if you want to analyze paralogs separately.

What you can do with OrthoFinder

OrthoGroup (OG) estimation
Estimating the orthologous gene set of 1 species x 1 species
Creating a phylogenetic tree
Selection of single copy genes

It will do the above four things automatically. Regarding 3, it will create a phylogenetic tree for each species and a phylogenetic tree for each OG. If you want to create a phylogenetic tree of a species using only single-copy genes, you will have to do it yourself.

Installation

OrthoFinder depends on Python2.7, so if you are using Python3.x, please build a virtual environment with pyenv, anaconda, etc. (Reference items / 5b62d31cb7e6ed50f02c)). To install, you need to install * BLAST + *, * MCL *, * FastMe *, * DLCpar * in addition to OrthoFinder itself.

OrthoFinder

git clone to download the package and unzip it.

$git clone https://github.com/davidemms/OrthoFinder.git
$tar xzj OrthoFinder-1.1.2.tar.gz

Put your PATH in the orthofinder directory.

MCL, FastMe There are no particular points to note. Those who have root privileges can easily build with sudo etc., and those who do not have root privileges can easily build by going to their respective websites and downloading. Please install by referring to the OrthoFinder Manual.
DLCper You need to be a little careful. You can install it in the same way as 2., but when building with setup.py, you need to do it in the directory where * bin * contains python (you can check with which python). Simply cp to the directory and run setup.py, or use the --prefix option to specify the directory to build. If you don't do this, the Python module dlcpar will not be in Python and OrthoFinder will not work.

How to use

Preparation

Prepare multiple Fasta files (.fa, .faa) you want to analyze
Combine all Fasta files into one directory

Specify the directory containing the Fasta files you want to parse. If you unzip the OrthoFinder package, you will find the ʻExampleData` directory containing the Fasta file directly underneath, so it is better to do a test run with it.

$python orthofinder.py -f your_fasta_dir -t 5 # -Specifying a file with the f option, -Specify the number of threads that can be used with the t option.

At this time, you can also specify a parallel job with the OrthoFinder algorithm with the -a option. It is necessary to consider the memory and set it so that it does not crash as follows.

0.02 GB per species for small genomes (e.g. bacteria)

0.04 GB per species for larger genomes (e.g. vertebrates)
0.2 GB per species for even larger genomes (e.g. plants)

When the analysis is finished, the Results_Date directory will be created directly under your_fasta_dir.

Check the result

The following files are generated in this directory:

Orthogroups.csv

Orthogroups.txt
Orthogroups_SpeciesOverlaps.csv
Orthogroups_UnassignedGenes.csv
Orthologues_Date (directory) → Directly under the Tree directory, ʻOrthologue directory`
Statistics_Overall.csv
Statistics_PerSpecies.csv

Orthogroups.csv file

The estimated Orthogroup is included in 1. as follows. Species are separated by Tabs and genes are separated by commas. 2. is the format version of OrthoMCL.

OG	Specie1	Specie2	Specie3
OG000001	gene_s1_1, gene_s1_3	gene_s2_1, gene_s2_2	gene_s3_2
OG000002	gene_s1_2, gene_s1_4	gene_s2_3	gene_s3_1, gene_s3_3

Statistics file

6.Statistics_Overall.csv contains 1) total number of genes used 2) estimated total number of OGs 3) percentage of genes classified as OG Contains information such as. 7.Statistics_PerSpecies.csv has the above data for each species.

Tree directory, Orthologue directory

A tree file of the phylogenetic tree for each OG is created in the Tree directory, and the phylogenetic tree of the species is contained in the directory directly above. In the Orthologue directory, a table of ortholog genes of 1 species x 1 species is created for each species used.

useful function

1. After the analysis is completed, add a new species and re-analyze.

Thankfully, OrthoFinder has additional features. As for how to use

Create a new directory and put the Fasta file you want to add
Analyze the Working Directory directly under the Result_Date directory of the original data you want to add by specifying it as follows. For this WorkingDirectory, specify the one that contains SpecieID.txt.

$python orthofinder -b previous_working_dir -f new_fasta_dir

2. After the analysis is completed, exclude the species and re-analyze.

You can kindly exclude it.

Open SpecieID.txt in Working Directory directly under Result of the original data with an editor.
Add # to the species you want to exclude and comment them out.
Analysis as follows

$python orthofinder -b previous_working_dir

3. Add and exclude at the same time

Of course, you can add and exclude at the same time. Prepare the Fasta you want to add, edit SpecieID.txt, and run it with the same command as when adding a new Fasta above.

4. Other

It is also possible to move only steps such as BLAST independently. You can also create a phylogenetic tree using MAFFT and FastTree. See the OrthoFinder Manual (https://github.com/davidemms/OrthoFinder/blob/master/OrthoFinder-manual.pdf) for more information.

[PYTHON] Orthologous analysis using OrthoFinder