[PYTHON] How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot

Purpose

From the results of 16S rRNA flora analysis using Qiime2, we will introduce a method for visualizing the distribution of the composition rate of specific bacteria. In the previous section, we compared the intestinal flora of the CD (Crohn's disease) group, UC (ulcerative colitis) group, and nonIBD group (non-inflammatory bowel disease) group. I will introduce how to represent it with a box whiskers diagram. With reference to this article, you will be able to create the following boxplots.

visualization-3.png

environment

package

This time, I will use Altair which can create various graphs by inputting Python DataFrame. Drawings other than box plots are also introduced at here.

About data

To create a boxplot, you need count data that summarizes the number of bacterial reads for each sample and sample metadata. For details, refer to Previous section.

Acquisition of count data

Table.qza and taxonomy.qza are required to get the count data. For how to create each file, refer to here. In this paper, since we use Phylum level count data, execute the following command, paying attention to --p-level 2.

Terminal (in Qiime2 virtual environment)


qiime taxa collapse   --i-table table.qza   --i-taxonomy taxonomy.qza   --p-level 2   --o-collapsed-table L2_table.qza

qiime tools export  --input-path L2_table.qza   --output-path L2

biom convert  -i L2/feature-table.biom  -o L2/table.tsv  --to-tsv

If you get the following file, you are successful.

スクリーンショット 2020-10-31 14.20.16.png

Get metadata

Create the following metadata in tsv format.

スクリーンショット 2020-10-31 14.25.20.png

Run Altair

You can get a box plot by executing the following command.

alt_comp_plot.py


import os
import altair as alt
import pandas as pd

#Designation of classification class. Phylum is level 2.
l_select = 'L2' 

#Get current directory
cwd = os.getcwd()

#Acquisition of count data
count_path = [l_select,'table.tsv'] 
count_file = os.path.join(cwd, *count_path)
count = pd.read_table(count_file, sep='\t', index_col=0 ,header=1).T # header=Note 1

#Convert to composition data
comp = count.apply(lambda x: x/sum(x), axis=1)

#Get metadata
md_path = ['metadata.tsv']
md_file = os.path.join(cwd, *md_path)
md = pd.read_table(md_file, sep='\t', index_col=0 ,header=0)

#Convert line name to str type (This line name is a number, so it has been processed by int type)
comp.index = comp.index.astype(str)
md.index = md.index.astype(str)

#Combine count data and metadata. (If the line name is not str type, it will not be combined)
df = pd.concat([comp,md], axis=1)

#This time, I will examine the flora of Ileum (ileum) and Rectum (rectum). (Because the number of samples was small in other parts)
df = df[df['biopsy_location'].isin(['Ileum','Rectum'])]

#Run Altair
boxplot = alt.Chart(df).mark_boxplot(size=100,ticks=alt.MarkConfig(width=30), median=alt.MarkConfig(color='black',size=100)).encode(
	    alt.X('diagnosis',sort = alt.Sort(['CD','UC','nonIBD']), axis=alt.Axis(labelFontSize=15, ticks=True, titleFontSize=18, title='Diagnosis')),
	    alt.Y('D_0__Bacteria;D_1__Firmicutes', axis=alt.Axis(format='%', labelFontSize=15, ticks=True, titleFontSize=18, grid=False,domain=True, title='Firmicutes'), scale=alt.Scale(domain=[0,0.02])),
	    alt.Color('diagnosis'),
	    alt.Column('biopsy_location', header=alt.Header(labelFontSize=15, titleFontSize=18), sort = alt.Sort(['Ileum','Rectum']), title='Biopsy')
	).properties(
		width=600,
		height=500,
	)

#Display of figure
boxplot.show()

About Altair

A brief introduction to Altair's commands.

Save figure

You can save the figure in png format or svg format from "..." on the upper right.

スクリーンショット 2020-10-31 16.39.36.png

Recommended Posts

How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
How to quickly count the frequency of appearance of characters from a character string in Python?
How to determine the existence of a selenium element in Python
How to check the memory size of a variable in Python
How to check the memory size of a dictionary in Python
How to get the vertex coordinates of a feature in ArcPy
How to create a large amount of test data in MySQL? ??
How to take a screenshot of the Chrome screen (prevent it from cutting off in the middle)
How to send a visualization image of data created in Python to Typetalk
How to calculate the volatility of a brand
A well-prepared record of data analysis in Python
How to pass the execution result of a shell command in a list in Python
How to mention a user group in slack notification, how to check the id of the user group
How to count the number of elements in Django and output to a template
A memorandum of how to execute the! Sudo magic command in Jupyter Notebook
The first step to log analysis (how to format and put log data in Pandas)
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
How to get the number of digits in Python
Steps to calculate the likelihood of a normal distribution
How to display the modification date of a file in C language up to nanoseconds
How to identify the element with the smallest number of characters in a Python list?
How to check in Python if one of the elements of a list is in another list
How to make a face image data set used in machine learning (2: Frame analysis of video to obtain candidate images)
[Ubuntu] How to delete the entire contents of a directory
How to find the optimal number of clusters in k-means
How to slice a block multiple array from a multiple array in Python
How to display the regional mesh of the official statistics window (eStat) in a web browser
How to use the __call__ method in a Python class
How to create an instance of a particular class from dict using __new__ () in python
How to calculate the amount of calculation learned from ABC134-D
How to log in automatically like 1Password from the CLI
How to develop in a virtual environment of Python [Memo]
How to generate a query using the IN operator in Django
How to calculate the sum or average of time series csv data in an instant
How to get the last (last) value in a list in Python
While solving the introductory statistics exercise 12.10, check how to draw a scatter plot in pandas.
How to find the scaling factor of a biorthogonal wavelet
How to get only the data you need from a structured data set using a versatile method
How to get an overview of your data in Pandas
How to get a list of links from a page from wikipedia
How to get a quadratic array of squares in a spiral!
How to plot a lot of legends by changing the color of the graph continuously with matplotlib
How to pass the execution result of a shell command in a list in Python (non-blocking version)
How to connect the contents of a list into a string
[See in the photo] How a kaggle beginner can rank up from "novice" to "Contributor" in 10 minutes.
[PyQt x pySerial] Display a list of COM ports connected to the PC in the combo box
Ported from R language of "Sazae-san's rock-paper-scissors data analysis" to Python
How to implement Java code in the background of RedHat (LinuxONE)
A simple data analysis of Bitcoin provided by CoinMetrics in Python
How to know the internal structure of an object in Python
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
How to change the color of just the button pressed in Tkinter
[Unexpectedly known? ] Introducing a real day in the data analysis department
How to get a string from a command line argument in python
How to avoid duplication of data when inputting from Python to SQLite.
[Introduction to Python] How to use the in operator in a for statement?
[TensorFlow 2] How to check the contents of Tensor in graph mode
How to find the memory address of a Pandas dataframe value
How to output the output result of the Linux man command to a file
<Pandas> How to handle time series data in a pivot table