Plink in Python

I want to use PLINK data files (.bed, .fam, .map) etc. in Python. There is a module called pyplink.

Installation

Can be installed with pip

shell


pip install pyplink

How to use

Suppose you have a set of files in your current directory, such as foo.bed foo.fam foo.bim

python3


from pyplink import PyPlink
pyp = PyPlink("foo")

This will create an object called pyp. This is an integrated object of .bed, .fam and .bim files. You can access each information with various member functions.

python3


pyp.get_fam()
pyp.get_nb_samples()
pyp.get_bim()
pyp.get_nb_markers()

python3


markerNames = pyp.get_bim().iloc[:,5]

Get the genotype by specifying the marker name. Base information can be obtained by using acgt.

python3


pyp.get_geno_marker(markerNames[0])
pyp.get_acgt_geno_marker(markerNames[0])

You can also get the marker ID and genotype as an iterator.

python3


markers = ["rs7092431", "rs9943770", "rs1578483"]
for marker_id, genoypes in pyp.iter_geno_marker(markers):
  print(marker_id)
  print(genotypes, end="\n\n")

Sample script

Get all male sample genotypes for markers on chromosome 23

python3


for marker_ID, genotypes in pyp.iter_geno_marker(y_markers):
    male_genotypes = genotypes[males]
    print("{:d} total genotypes".format(len(genotypes)))
    print("{:d} genotypes for {:,d} males ({} on chr{} and position {:,d})".format(
        len(male_genotypes),
        males.sum(),
        marker_ID,
        all_markers.loc[marker_ID, "chrom"],
        all_markers.loc[marker_ID, "pos"],
    ))
    break

Get the Minor allele frequency and genotype of the specified marker

python3


founders = (all_samples.father == "0") & (all_samples.mother == "0")
markers = ["rs7092431", "rs9943770", "rs1587483"]

for marker_ID, genotypes in pyp.iter_geno_marker(markers):
    valid_genotypes = genotypes[founders.values & (genotypes != -1)]
    maf = valid_genotypes.sum()/(len(valid_genotypes)*2)
    print(marker_ID, round(maf, 6), sep="\t")
    print(genotypes)

Recommended Posts

Plink in Python
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
SendKeys in Python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python
Daily AtCoder # 6 in Python
Daily AtCoder # 18 in Python
Edit fonts in Python
Singleton pattern in Python
File operations in Python
Read DXF in python
Daily AtCoder # 53 in Python
Key input in Python
Use config.ini in Python
Daily AtCoder # 33 in Python
Solve ABC168D in Python
Logistic distribution in Python
Daily AtCoder # 7 in Python
LU decomposition in Python
Simple gRPC in Python