[Linux] I tried using the genetic statistics software PLINK


I think I will need statistical genetics at work, so I tried using PLINK, a genetic statistics software. Recently, a nice book was published, so I tried it hands-on.

Practice from scratch genetic statistics seminar

However, since this book is written for Windows, I thought I should write a memorandum of how to do it on Mac. This book is good.

Download PLINK

Download the MacOS version of PLINK from the following page.


Launch your Mac terminal. On the terminal, specify the working directory with the cd command.

Specifying the working directory

$ cd /Working directory path/

Move the downloaded ** PLINK ** (PLINK executable file) to your working directory.


On the terminal, type ./plink.


$ ./plink

Execution result

PLINK v1.90b6.16 64-bit (19 Feb 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3

  plink <input flag(s)...> [command flag(s)...] [other flag(s)...]
  plink --help [flag name(s)...]

Commands include --make-bed, --recode, --flip-scan, --merge-list,
--write-snplist, --list-duplicate-vars, --freqx, --missing, --test-mishap,
--hardy, --mendel, --ibc, --impute-sex, --indep-pairphase, --r2, --show-tags,
--blocks, --distance, --genome, --homozyg, --make-rel, --make-grm-gz,
--rel-cutoff, --cluster, --pca, --neighbour, --ibs-test, --regress-distance,
--model, --bd, --gxe, --logistic, --dosage, --lasso, --test-missing,
--make-perm-pheno, --tdt, --qfam, --annotate, --clump, --gene-report,
--meta-analysis, --epistasis, --fast-epistasis, and --score.

"plink --help | more" describes all functions (warning: long).

PLINK is executed with ./plink-(command) (argument).

Read file

The file read commands are --file and --bfile. --file reads genotype data in ** ped | map ** format. --bfileIsbed|bim|famRead format genotype data. NGS datavcfThe format is basic,ped|mapData converted to format,ped|mapConverted format to binary formatbed|bim|famUse the format.

--out specifies the name of the output file.

This file was stored in the working directorySNP.bedSNP.bimSNP.famofbed|bim|famFormat. Therefore, the argument of --bfile is SNP before the extension of the file.

Read file

$ ./plink --bfile SNP --out test

Doing this will generate a file called ** test.log **. Open this file with a text editor or the following command.

$ less test.log

Allele frequency calculation

You can calculate the allele frequency of each SNP with --freq.

Calculate SNP allele frequency

$ ./plink --bfile SNP --out test1 --freq

Open the output file with a text editor or the following command.

$ less test1.frq

SNP filtering

Prior to analysis, genomic data is filtered to exclude SNPs with a minor allele (MAF) frequency of 1% or 0.5% or less. Mannered in GWAS. Exclude SNPs of MAF below the numerical value with --maf (numerical value). --make-bedNew data after filtering withbed|bim|famCreated as a format file. This time, SNPs of 1% or less are excluded.

Filter SNPs by minor allele frequency

$ ./plink --bfile SNP --out test2 --maf 0.01 --make-bed

Recommended Posts

[Linux] I tried using the genetic statistics software PLINK
[Linux] GWAS with genetic statistics software PLINK
[Linux] eQTL analysis with genetic statistics software PLINK
I tried using the checkio API
I tried using the BigQuery Storage API
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
I tried using PyCaret at the fastest speed
I tried using the Datetime module by Python
I tried using the image filter of OpenCV
I tried using the functional programming library toolz
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried clustering ECG data using the K-Shape method
I tried to approximate the sin function using chainer
I tried using the API of the salmon data project
[MNIST] I tried Fine Tuning using the ImageNet model.
I tried installing the Linux kernel on virtualbox + vagrant
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to compress the image using machine learning
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using BigQuery ML
I tried using Amazon Glacier
I tried the changefinder library!
I tried using git inspector
I tried to reintroduce Linux
[Python] I tried using OpenPose
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried using the Python library from Ruby with PyCall
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried face recognition of the laughter problem using Keras.
I tried using the DS18B20 temperature sensor with Raspberry Pi