PhytoMine-I tried to get the genetic information of plants with Python

I found out that there is a PhytoMine that can call the data of Phytozome from Python, so I tried it. Phytozome is a site familiar to plant researchers, and is a convenient site for examining the genomic and genetic information of various plant species.

PhytoMine is one of the registries of a data warehouse system called InterMine.

InterMine is an open source data warehouse system licensed under LGPL2.1. InterMine is used to create a database of biological data accessed by advanced web query tools. You can use InterMine to create a database from a single dataset or integrate multiple data sources. Support for some common biological formats is provided and there is a framework for adding other data. InterMine includes a user-friendly web interface that works "out of the box" and is easy to customize. From Wikipedia "InterMine"

InterMine is available in a variety of programming languages, including Python. See API and Client Libraries for more information.

I tried using PhytoMine in Python by referring to InterMine-Python Tutorial. The installation was done with pip.

$ pip install intermine

I specified the gene function and plant species as a query and tried to get a list of genes in Python. The list was created in Pandas. The source code is as follows.

size = 20 #Specify the number of data to acquire

import pandas as pd
from intermine.webservice import Service

service = Service("https://phytozome.jgi.doe.gov/phytomine/service") #Create an instance by specifying the URL of PhytoMine
query = service.new_query("Gene") #Get genetic information
query.add_constraint("briefDescription","CONTAINS","transcription factor") #Specify gene function(Condition A)
query.add_constraint("name","CONTAINS","Eucgr") #At the beginning of the gene name of Eucalyptus Grandis"Eucgr"Designate Eucalyptus Grandis as a plant species using(Condition B)
query.add_constraint("name","CONTAINS","Potri") #At the beginning of the poplar gene name"Potri"Designate poplar as a plant species using(Condition C)
query.set_logic("A & (B | C)") #Settings for examining the genes of both Eucalyptus Grandis and Poplar(Condition A and condition B or condition C)

dfs = [] #Create an empty list to save the output
for row in query.rows(size=size):
    dfs.append(pd.DataFrame(row.values(),index=row.keys()).T) #Get data and save to list

dfs = pd.concat(dfs) #Convert list to dataframe
dfs.to_csv("Tree_TFs_Top20.csv")  #Save dataframe in csv format
Gene.briefDescription Gene.cytoLocation Gene.description Gene.genomicOrder Gene.id Gene.length Gene.name Gene.primaryIdentifier Gene.score Gene.scoreType Gene.secondaryIdentifier Gene.symbol
0 (1 of 102) PF00319 - SRF-type transcription fa... None None None 49560540 186 Potri.010G098100 Potri.010G098100 None None PAC:26981244 None
0 (1 of 102) PF00319 - SRF-type transcription fa... None None None 303626540 186 Potri.010G098100 Potri.010G098100 None None PAC:37221527 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 48348276 2263 Potri.007G090600 Potri.007G090600 None None PAC:27016559 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 48359640 1853 Potri.003G139300 Potri.003G139300 None None PAC:26998891 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 48837989 1051 Potri.005G168700 Potri.005G168700 None None PAC:27030760 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 49691741 1649 Potri.017G055400 Potri.017G055400 None None PAC:26983926 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 50099858 2177 Potri.005G077300 Potri.005G077300 None None PAC:27029242 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 50216626 2401 Potri.013G135600 Potri.013G135600 None None PAC:26993814 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 50231866 2179 Potri.019G102200 Potri.019G102200 None None PAC:27025339 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 303511172 2177 Potri.005G077300 Potri.005G077300 None None PAC:37265642 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 303527050 1051 Potri.005G168700 Potri.005G168700 None None PAC:37263387 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 303695561 2263 Potri.007G090600 Potri.007G090600 None None PAC:37252859 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 303799992 2401 Potri.013G135600 Potri.013G135600 None None PAC:37233326 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 303940612 2179 Potri.019G102200 Potri.019G102200 None None PAC:37260937 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 304098097 1649 Potri.017G055400 Potri.017G055400 None None PAC:37223899 None
0 (1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... None None None 304255554 1853 Potri.003G139300 Potri.003G139300 None None PAC:37236557 None
0 (1 of 11) K08064 - nuclear transcription facto... None None None 49458724 4801 Potri.011G098400 Potri.011G098400 None None PAC:27000615 None
0 (1 of 11) KOG4282 - Transcription factor GT-2 ... None None None 174786351 2903 Eucgr.J01012 Eucgr.J01012 None None PAC:32033046 None
0 (1 of 11) KOG4282 - Transcription factor GT-2 ... None None None 174819386 2316 Eucgr.J02994 Eucgr.J02994 None None PAC:32035652 None
0 (1 of 11) KOG4282 - Transcription factor GT-2 ... None None None 175094637 2197 Eucgr.G03225 Eucgr.G03225 None None PAC:32071912 None

Looking at PhytoMine's Query Builder page, it seems that there are various data types other than genes that can be used for queries, so I will try it little by little. I want to.

Recommended Posts

PhytoMine-I tried to get the genetic information of plants with Python
I tried to get the movie information of TMDb API with Python
I tried to get the authentication code of Qiita API with Python.
I tried to find the entropy of the image with python
I tried to get the location information of Odakyu Bus
Get the source of the page to load infinitely with python.
I tried to improve the efficiency of daily work with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)
I tried to get CloudWatch data with Python
Get CPU information of Raspberry Pi with Python
Python script to get note information with REAPER
I tried to streamline the standard role of new employees with Python
python beginners tried to predict the number of criminals
How to get the number of digits in Python
Add information to the bottom of the figure with Matplotlib
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to get started with blender python script_Part 02
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
Try to get the contents of Word with Golang
I tried to visualize the spacha information of VTuber
I tried to solve the problem with Python Vol.1
Get the operation status of JR West with Python
[Python] I tried to get Json of squid ring 2
Extract the band information of raster data with python
Get Alembic information with Python
How to get a list of files in the same directory with python
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to summarize the string operations of Python
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
Try to get the function list of Python> os package
I tried to find the average of the sequence with TensorFlow
I tried to notify the train delay information with LINE Notify
Minimum knowledge to get started with the Python logging module
How to get into the python development environment with Vagrant
I tried to divide the file into folders with Python
I tried to get various information from the codeforces API
[Introduction to Python] How to get data with the listdir function
Link to get started with python
Get the weather with Python requests
Get the weather with Python requests 2
How to get the Python version
How to get started with Python
Get weather information with Python & scraping
How to write offline real time I tried to solve the problem of F02 with Python
I tried to create a Python script to get the value of a cell in Microsoft Excel
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
Try to import to the database by manipulating ShapeFile of national land numerical information with Python
I tried to solve the ant book beginner's edition with python
I tried to get the index of the list using the enumerate function
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
[Python] A memo that I tried to get started with asyncio
[Python] How to get the first and last days of the month
Output the contents of ~ .xlsx in the folder to HTML with Python
I tried to create a list of prime numbers with python