I think I will need statistical genetics at work, so I tried using PLINK, a genetic statistics software. Recently, a nice book was published, so I tried it hands-on.
Practice from scratch genetic statistics seminar
However, since this book is written for Windows, I thought I should write a memorandum of how to do it on Mac. This book is good.
Download the MacOS version of PLINK from the following page.
Launch your Mac terminal.
On the terminal, specify the working directory with the cd
command.
Specifying the working directory
$ cd /Working directory path/
Move the downloaded ** PLINK ** (PLINK executable file) to your working directory.
On the terminal, type ./plink
.
Start PLINK
$ ./plink
Execution result
PLINK v1.90b6.16 64-bit (19 Feb 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
plink <input flag(s)...> [command flag(s)...] [other flag(s)...]
plink --help [flag name(s)...]
Commands include --make-bed, --recode, --flip-scan, --merge-list,
--write-snplist, --list-duplicate-vars, --freqx, --missing, --test-mishap,
--hardy, --mendel, --ibc, --impute-sex, --indep-pairphase, --r2, --show-tags,
--blocks, --distance, --genome, --homozyg, --make-rel, --make-grm-gz,
--rel-cutoff, --cluster, --pca, --neighbour, --ibs-test, --regress-distance,
--model, --bd, --gxe, --logistic, --dosage, --lasso, --test-missing,
--make-perm-pheno, --tdt, --qfam, --annotate, --clump, --gene-report,
--meta-analysis, --epistasis, --fast-epistasis, and --score.
"plink --help | more" describes all functions (warning: long).
PLINK is executed with ./plink-(command) (argument)
.
The file read commands are --file
and --bfile
.
--file
reads genotype data in ** ped | map ** format.
--bfile
Isbed|bim|famRead format genotype data.
NGS datavcfThe format is basic,ped|mapData converted to format,ped|mapConverted format to binary formatbed|bim|famUse the format.
--out
specifies the name of the output file.
This file was stored in the working directorySNP.bed、SNP.bim、SNP.famofbed|bim|famFormat.
Therefore, the argument of --bfile
is SNP
before the extension of the file.
Read file
$ ./plink --bfile SNP --out test
Doing this will generate a file called ** test.log **. Open this file with a text editor or the following command.
$ less test.log
You can calculate the allele frequency of each SNP with --freq
.
Calculate SNP allele frequency
$ ./plink --bfile SNP --out test1 --freq
Open the output file with a text editor or the following command.
$ less test1.frq
Prior to analysis, genomic data is filtered to exclude SNPs with a minor allele (MAF) frequency of 1% or 0.5% or less.
Mannered in GWAS.
Exclude SNPs of MAF below the numerical value with --maf (numerical value)
.
--make-bed
New data after filtering withbed|bim|famCreated as a format file.
This time, SNPs of 1% or less are excluded.
Filter SNPs by minor allele frequency
$ ./plink --bfile SNP --out test2 --maf 0.01 --make-bed
Recommended Posts