Tips: [Python] Randomly restore and extract an array from a fasta file

There are times when you want to randomly sample a large number of sequences. This time, I will introduce the code to use in such a case.

The method is simply to open the file, select an array with random numbers and access it.

Select an array with random numbers and access


#Get array
seqnum=random.randint(1,seqnumMAX)
name = linecache.getline(infn, seqnum*2-1) #Array name
seq = linecache.getline(infn, seqnum*2)  #Array
#writing
outfdl    =   open(outfn, 'w')
outfdl.writelines(name+seq)
outfdl.close()
linecache.clearcache() #Clear cache

At this time, `linecache.getline (input file name, number of lines)` is used for reading. This will internally optimize the reading of the specified line. Bioinformatics tends to handle large array files, which may be useful.

Random restoration extraction can be performed by looping as many times as you want to acquire this process.

The above can be summarized as a proper process as follows.

randsampleFasta.py


#!/usr/local/bin/python3
# -*- coding: utf-8 -*-

"""
Randomly restore and extract an array from a fasta file.
"""
__author__  = "Kazuki Nakamae <[email protected]>"
__version__ = "0.00"
__date__    = "29 May 2017"

import sys
import mmap
import linecache
import random

def randsampleFasta(infn,outfn,n):
    """
Randomly restore and extract an array from a fasta file.
    @param infn fasta file name to read{string} 
    @param outfn output fasta file name{string} 
    @param n Number of arrays to restore and extract{int} 
    """

    infdl     =   open(infn, 'r')
    #Check the number of arrays in the input file
    print('Checking how many sequences are in a FASTA format sequence file....(1/2)')
    buf = mmap.mmap(infdl.fileno(), 0, prot=mmap.PROT_READ)
    seqnumMAX= 0
    readline = buf.readline
    while readline():
        seqnumMAX += 1
    seqnumMAX = int(seqnumMAX / 2)
    infdl.close()
    print(str(seqnumMAX)+' sequences')
    #Random restoration extraction of the array
    print('random sampling from a sequence with replacement....(2/2)')
    outfdl    =   open(outfn, 'w')
    random.seed(a='hoge', version=2)    #Seed setting
    seqi=1
    while seqi<=n:
        seqnum=random.randint(1,seqnumMAX)
        name = linecache.getline(infn, seqnum*2-1)
        seq = linecache.getline(infn, seqnum*2)
        outfdl.writelines(name+seq)
        seqi += 1
    outfdl.close()
    linecache.clearcache() 
    print('done.')

if __name__ == '__main__':
    argvs = sys.argv  #Command line arguments
    argc = len(argvs) #Number of arguments

    if (argc != 4):   #Argument check
        print("USAGE : python3 randsampleFasta.py <INPUT.fa> <OUTPUT.fa> <NUMBER_OF_SEQUENCES>")
        quit()

    randsampleFasta(argvs[1],argvs[2],int(argvs[3]))
quit()

Example

File to enter

test.fa


>1
CCGTATTGGAAAGCTC
>2
AGGATTATCGGATACT
>3
ATCCGGACGGGGGGTT
>4
GACCTCGTTATCATCC
>5
AGTCAGGTTACCCGCA

Input on Bash

input


python3 randsampleFasta.py test.fa out.fa 4

Output on Bash

Standard output


Checking how many sequences are in a FASTA format sequence file....(1/2)
5 sequences
random sampling from a sequence with replacement....(2/2)
done.

Output file

out.fa


>3
ATCCGGACGGGGGGTT
>2
AGGATTATCGGATACT
>4
GACCTCGTTATCATCC
>3
ATCCGGACGGGGGGTT

Use of code

You are free.

Recommended Posts

Tips: [Python] Randomly restore and extract an array from a fasta file
Extract every n elements from an array (list) in Python and Ruby
[Python] Start a batch file from Python and pass variables.
Try to extract a character string from an image with Python3
# 5 [python3] Extract characters from a character string
Create a deb file from a python package
Extract lines that match the conditions from a text file with python
Use BeautifulSoup to extract a link containing a string from an HTML file
Create a C array from a Python> Excel sheet
Python --Read data from a numeric data file and find the multiple regression line.
[Python] Concatenate a List containing numbers and write it to an output file.
Run a Python file from html using Django
Randomly select elements from list (array) in python
Read line by line from a file with Python
Extract data from a web page with Python
How to use NUITKA-Utilities hinted-compilation to easily create an executable file from a Python script
Pass an array from PHP to PYTHON and do numpy processing to get the result
Make a copy of a Google Drive file from Python
Python-Read data from a numeric data file and calculate covariance
I tried running python etc. from a bat file
[Python beginner] Extract prefectures and cities from addresses (3 lines).
I want to randomly sample a file in Python
Python script to create a JSON file from a CSV file
[Python] Extract only numbers from lists and character strings
Python: I tried a liar and an honest tribe
Python and numpy tips
Python --Read data from a numeric data file to find the covariance matrix, eigenvalues, and eigenvectors
Extract only the sound of a specific instrument from a MIDI file and make it a separate file
Get an image from a web page and resize it
Extract template of EML file saved from Thunderbird with python3.7
How to slice a block multiple array from a multiple array in Python
How to swap elements in an array in Python, and how to reverse an array.
Python regular expression basics and tips to learn from scratch
Started Python: Swap an array of values obtained from SQL results to a list type and use it in IN of another query
Hash with python and escape from a certain minister's egosa
Python: Create a dictionary from a list of keys and values
Python> Read from a multi-line string instead of a file> io.StringIO ()
[Python] How to output a pandas table to an excel file
Shoot time-lapse from a PC camera using Python and OpenCV
Turn an array of strings with a for statement (Python3)
Extract the value closest to a value from a Python list element
Python: Extract file information from shared drive with Google Drive API
Get 1000 posts in Python order from all Slack channels and put them together in a txt file
Quicksort an array in Python 3
Read and write a file
Create a python numpy array
Write and read a file
Create an instance of a predefined class from a string in Python
How to make a string into an array or an array into a string in Python
Process Splunk execution results using Python and save to a file
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Read a file in Python with a relative path from the program
[python] Extract text from pdf and read characters aloud with Open-Jtalk
[Python] How to store a csv file as one-dimensional array data
Various ways to create an array of numbers from 1 to 10 in Python.
Get the formula in an excel file as a string in Python
Get OCTA simulation conditions from a file and save with pandas
Extract database tables with CSV [ODBC connection from R and python]
I thought a Python class variable was an instance variable and died
Python --Get bitcoin rate BTC / JPY from bitflyer at regular intervals and save it to a file
A script that retrieves tweets with Python, saves them in an external file, and performs morphological analysis.