[Linux] eQTL analysis with genetic statistics software PLINK


Recently, I started using PLINK, a genetic statistics software. Learn how to use PLINK with reference to this book.

Practice from scratch genetic statistics seminar

This book is written for Windows, so I'll write a reminder of how to do it on a Mac. This time, I tried eQTL analysis with PLINK.


The basic usage of PLINK and the GWAS analysis method have been posted before. [Linux] I tried using the genetic statistics software PLINK [Linux] GWAS with genetic statistics software PLINK

Data to use

bed|bim|fam format file

SNP filtered files

Phenotype file

BLK gene expression level data

eQTL analysis

In the eQTL analysis, a linear analysis is performed in order to analyze the quantitative trait called gene expression level. List of commands to use. --pheno: Enter the phenotype file used for GWAS (Exp_BLK.txt this time) --linear: Perform linear regression --ci 0.95: Output 95% confidence interval

GWAS (Linear Analysis)

$ ./plink --bfile SNP_QC --out SNP_QC_Exp_BLK --pheno Exp_BLK.txt --linear --ci 0.95

Confirm that the file named ** SNP_QC_Exp_BLK.assoc.linear ** is output to the working directory, and open it with a text editor. The first column is the chromosome number, the second column is the SNP ID, the third column is the chromosome position, and the twelfth column is the * p * value.

Extraction of elements by AWK

From the GWAS result, use the AWK command to extract the columns of "chromosome number", "SNP ID", "chromosome position", and "* p * value". Use the AWK command to set the input file as ** SNP_QC_Exp_BLK.assoc.linear ** and the output file as a text file ** SNP_QC_Exp_BLK.assoc.linear.P.txt *. In AWK, separate them with ‘’ and write commands in them to execute them. By {print $ 2 "\ t" $ 1 "\ t" $ 3 "\ t" $ 12} The data frame is "2nd column [SNP ID] 1st column [chromosome number] 3rd column [chromosome position] 4th column [ p * value]". The command that "\ t" is separated by tabs. Output as a text file specified by > .

Extract elements from GWAS results with AWK command and output text file

$ awk '{print $2"\t"$1"\t"$3"\t"$12}' SNP_QC_Exp_BLK.assoc.linear > SNP_QC_Exp_BLK.assoc.linear.P.txt

Draw a Manhattan plot using this GWAS result.

Manhattan plot drawing

When I drew a Manhattan plot, it looked like this. 1KG_EUR_QC_Exp_BLK.assoc.linear.P_position_P_15.png

Identification of SNP

Let's extract the SNPs that showed the eQTL effect.

Extract SNP with AWK

awk '$4<=10^-12 {print $0}' 1KG_EUR_QC_Exp_BLK.assoc.linear.P.txt


rs13255193	8	11309192	4.539e-13
rs13257831	8	11332964	6.545e-13
rs2736345	8	11352485	1.707e-14
rs1478898	8	11395079	6.882e-16
rs2244894	8	11448659	1.497e-13
rs2244648	8	11450422	2.068e-14
rs13273172	8	11461111	1.188e-14

The SNP with the smallest * p * value was ** rs1478898 **.

Recommended Posts

[Linux] eQTL analysis with genetic statistics software PLINK
[Linux] GWAS with genetic statistics software PLINK
[Linux] I tried using the genetic statistics software PLINK
Statistics with python
Text mining with Python ① Morphological analysis (re: Linux version)
[Linux command] Petit data analysis with grep / awk / sort command