python string processing map and lambda

I will explain how to use map and lambda in python using fastq file.

The following test file is a file of the analysis result of the DNA sequencer called fastq file, which is familiar in bioinformatics. The @ line is the header, the next line is the DNA base sequence, the 3rd line is the 4th line with + in between. Is the quality evaluation value for each character of the DNA base sequence on the second line, and the value obtained by adding 33 to the quality evaluation value is the number converted with ASCII characters.

test.fastq


@test1
GAGCACACGTCTNNANNCNAGTCANNANNNANNNNNNNNNNANNCNNNNNNTNNNNNNNNANNNNTGTCCATTGCNNNCACATCATTGTTTACTTGCGCNT
+
;<<:?@9<?############################################################################################

I want to correct the quality evaluation value to the original value. So I tried to write it in python, but I came across a very convenient combination of map and lambda, so I will make a note of it. By the way, the environment is python2. Note: Differences in how higher-order functions are used between python versions.

For example, to convert a quality evaluation value of A to a number, python uses a built-in function called ** ord (opposite chr) ** to convert the ASCII code to a number, then subtract 33 to get the original Will be the value of.

>  python -c 'print ord("A")-33'
32

To convert this to all 101 characters on the quality value line of the test file, use the for statement.

convert_asci.py


asci_string = ";<<:?@9<?############################################################################################"
for baseq in asci_string:
    score = ord(baseq) - 33
    print score,

Can be written as.

convert_asci.py execution result


26 27 27 25 30 31 24 27 30 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

The larger this value is, the better the quality is, so you can see that the base of the quality value of "#" is very poor quality. By the way, this ASCI code conversion program uses a for statement, and it is difficult for the code to become vertically long as the program gets longer, and the execution speed seems to be slow. So, let's express it using map.

py:convert_asci.2.py


asci_string = ";<<:?@9<?############################################################################################"

def convert_func(x):
    score = ord(x) - 33
    return score

res_score = map(convert_func, asci_string)
print res_score

text:convert_asci.2.py execution result


[26, 27, 27, 25, 30, 31, 24, 27, 30, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Even if you avoid the for statement in this way, the statement on the line that defines the function will be long. So I learned that you can use an anonymous function ** called ** lambda to write a process equivalent to the "convert_func" function in a map (did you ever know!). It will be as follows.

py:convert_asci.3.py modified script


asci_string = ";<<:?@9<?############################################################################################"
res_score = map(lambda x:ord(x) - 33, asci_string)
print res_score  

** The character string is internally divided into one character and iterated in the for statement and map. I received the information. Thank you, I have corrected it. ** **

py:convert_asci.3.Script before py modification


asci_string = ";<<:?@9<?############################################################################################"
asci_list = list(asci_string) #There was no need to do this (listing).

res_score = map(lambda x:ord(x) - 33, asci_list)
print res_score

How is it? I called it in one line. The result is the same as map returns a list. Anonymous functions are disposable functions that are used only once. It's called an anonymous function because it's only used once and doesn't need to be named. The format for creating an anonymous function using a lambda expression is as follows.

lambda argument(In the example x):Return value(In the example, ord(x) - 33)

In the example, x receives the value of asci_list from map as an argument one by one, executes the specified process, and then returns the return value. This is very convenient!

Recommended Posts

python string processing map and lambda
Ruby, Python and map
Python string processing illustration
Python indentation and string format
Data cleansing 1 Convenient Python notation such as lambda and map
Python string
Socket communication and multi-thread processing by Python
(Java, JavaScript, Python) Comparison of string processing
Python: String concatenation
Python string format
python image processing
python string slice
Python file processing
Python2 string type
Python string format
Python # string type
Python string inversion
Create a web map using Python and GDAL
Compare the speed of Python append and map
Amazon API Gateway and AWS Lambda Python version
Differences in string processing between Python, Ruby, JS, PHP (combination and variable expansion)
[python] Compress and decompress
Python and numpy tips
Dynamic HTML pages made with AWS Lambda and Python
[Python] pip and wheel
Full-width and half-width processing of CSV data in Python
Python: Natural language processing
Communication processing by Python
Multithreaded processing in python
Batch design and python
String manipulation in python
Python iterators and generators
A python lambda expression ...
[Python] Measures and displays the time required for processing
Python packages and modules
Vue-Cli and Python integration
[Python] Multi-line string assignment
Python string manipulation master
Hexadecimal string and string conversion
Text processing in Python
Queue processing in Python
python input and output
Python and Ruby split
[Python2] Date string-> UnixTime-> Date string
Random string generation (Python)
[Python] How to create Correlation Matrix and Heat Map
Python asynchronous processing ~ Full understanding of async and await ~
filter, map, reduce with js and python (There are also arrow expressions, lambda expressions, comprehension expressions)
Python3> documentation string / docstring
Make ordinary tweets fleet-like with AWS Lambda and Python
Python3, venv and Ansible
Python asyncio and ContextVar
Summary of date processing in Python (datetime and dateutil)
Various processing of Python
Anonymous and map functions
Consider common pre-processing when processing DynamoDB Stream with Lambda (Python)
Quickly take a query string with API Gateway-> Lambda (Python)
Remove double-byte spaces before and after the character string (python)
Decrypt one line of code in Python lambda, map, list
[Let's play with Python] Image processing to monochrome and dots
Site monitoring and alert notification with AWS Lambda + Python + Slack