Python speed comparison regex vs startswith vs str [: word_length]

Python speed comparison regex vs startswith vs str [: num]

Introduction

There are several ways to determine a prefix match for a string that is Ptyhon. Among them, the following three typical speed comparisons are performed.

Measuring method

environment

Implement in the following execution environment

item value
Python Version 3.8.2
OS Ubuntu 20.04

program

Check the operation based on the following program. The roles of each variable and each function are as follows. Change the variable according to the characteristics you want to measure.

variable/function Description
time_logging Decorator for measuring time
compare_regex Compare each of the list of argument strings with a regular expression
compare_startswith Each of the list of argument stringsstartswithCompare by method
compare_str The first string in each of the list of argument strings istarget_wordCompare if equal to
target_word Character string to be compared
match_word target_wordString prefix that matches
not_match_word target_wordString prefix that does not match
compare_word_num Total number of strings to compare
compare_func Function to measure
main Function to be called
import re
import time


def time_logging(func):
    def deco(*args, **kwargs):
        stime = time.time()
        res = func(*args, **kwargs)
        etime = time.time()
        print(f'Finish {func.__name__}. Takes {round(etime - stime, 3)}s.', flush=True)
        return res

    return deco


@time_logging
def compare_regex(compare_words):
    pattern = re.compile(f'^{target_word}')
    for word in compare_words:
        if pattern.match(word):
            pass


@time_logging
def compare_startswith(compare_words):
    for word in compare_words:
        if word.startswith(target_word):
            pass


@time_logging
def compare_str(compare_words):
    length = len(target_word)
    for word in compare_words:
        if word[:length] == target_word:
            pass


target_word = f'foo'
match_word = f'{target_word}'
not_match_word = f'bar'
compare_word_num = 100_000_000
match_rate = 50
compare_func = compare_regex


def main():
    compare_words = []
    for index in range(compare_word_num):
        if index % 100 <= match_rate:
            compare_words.append(f'{match_word}_{index}')
        else:
            compare_words.append(f'{not_match_word}_{index}')

    compare_func(compare_words)


if __name__ == '__main__':
    main()

Parameters

Since the tendency of execution speed may change depending on the length of the character string to be compared, Measure the execution speed of compare_regex, compare_startswith, and compare_str when target_word is changed to 5, 10, 50, 100, and 500 characters, respectively.

Measurement

Unit (seconds)

function\word count 5 10 50 100 500
compare_regex 11.617 12.044 16.126 18.837 66.463
compare_startswith 6.647 6.401 6.241 6.297 6.931
compare_str 5.941 5.993 4.87 5.449 8.875

chart.png

Consideration

In terms of speed, it should be implemented with starts with or str [: word_length] for any number of characters. The most recommended is starts with, which is the least affected by the string to be compared. I also like it the most in terms of readability.

Recommended Posts

Python speed comparison regex vs startswith vs str [: word_length]
Python, Java, C ++ speed comparison
Speed comparison of Python XML parsing
Comparison of Python serverless frameworks-Zappa vs Chalice
[Python3] Coarse graining of numpy.ndarray Speed comparison etc.
First Python 3 ~ First comparison ~
My str (python)
File write speed comparison experiment between python 2.7.9 and pypy 2.5.0
[Ruby vs Python] Benchmark comparison between Rails and Flask