I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)

There was a case where I had to process a lot of text in my business, so I made it in Ruby, but it's really slow ... There is room for tuning in terms of code, but since it's not a story if you're using a slow guy in the first place, I decided to measure the speed difference in the main LL.

First of all, the measurement result

Perl wins for small texts (6MB), and Python wins for large texts. In each case, Ruby escapes the bottom by a small margin, but it's late. This is in contrast to Perl and Python, where their strengths and weaknesses are largely divided.

???_2.png

By the way, measurement conditions

--Extract IP address with regular expression from Nginx access log --Write the extraction result to a file --The machine used is an FD Core i5 iMac. ――Measure all 3 times --No abnormal values (extremely out-of-order values) were found

Introducing participating players

First from Ruby players

show_ruby_version


$ ruby -v
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin12.3.0]

The code to use is as follows

regex_test.rb


#!/usr/bin/env ruby

re_addr = /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/

fh2 = open("./result_rb.txt", "w")
open("./access.log.1") { |fh|
  while line = fh.gets
    if m = re_addr.match(line)
      fh2.puts m[1]
    end
  end
}
fh2.close

Next Python player

show_python_version


$ python --version
Python 2.7.2

The code to use is as follows

regex_test.py


#!/usr/bin/env python

import re
re_addr = re.compile("((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))")

fh2 = open('./result_py.txt', 'w')
fh = open('./access.log.1')
for line in fh.readlines():
    m = re_addr.search(line)
    if m is not None:
        fh2.write(m.group(1))
        fh2.write("\n")
fh.close()
fh2.close()

Finally, a big veteran Perl player

show_perl_version


$ perl -v

This is perl 5, version 12, subversion 4 (v5.12.4) built for darwin-thread-multi-2level
#(Since it is long, it is omitted below)

The code to use is as follows

regex_test.pl


#!/usr/bin/env perl

open(FH2, ">", "./result_pl.txt");
open(FH, "<", "./access.log.1");
while($line = readline FH) {
  if ($line =~ /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/) {
    print FH2 $1."\n";
  }
}
close(FH);
close(FH2);

Like this, the logic is almost the same in each language, and it doesn't do much.

Afterword

Since it is a regular expression + file writing, it is not a comparison of pure regular expression ability. However, please forgive me because the reason for this benchmark is the processing of a large amount of text (extracting specific data with a regular expression and writing it to a file).

Since the memory release of each language is not properly considered, the result may change again if such a thing is done properly. Please use it as a reference only.

Recommended Posts

I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
I compared the speed of Hash with Topaz, Ruby and Python
I compared the speed of the reference of the python in list and the reference of the dictionary comprehension made from the in list.
I replaced the numerical calculation of Python with Rust and compared the speed
I compared the speed of go language web framework echo and python web framework flask
Overlapping regular expressions in Python and Java
I compared the calculation time of the moving average written in Python
[Introduction to Python] I compared the naming conventions of C # and Python.
How to write the correct shebang in Perl, Python and Ruby scripts
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
I measured the speed of list comprehension, for and while with python2.7.
Specifying the range of ruby and python arrays
Compare the speed of Python append and map
Try the free version of Progate [Python I]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 3: Regular expressions 20 to 24]
Divides the character string by the specified number of characters. In Ruby and Python.
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
Difference between Ruby and Python in terms of variables
I checked out the versions of Blender and Python
Version control of Node, Ruby and Python with anyenv
I checked the reference speed when using python list, dictionary, and set type in.
I compared Java and Python!
Use regular expressions in Python
About Python and regular expressions
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried the accuracy of three Stirling's approximations in python
Get rid of dirty data with Python and regular expressions
I tried programming the chi-square test in Python and Java.
I want to know the features of Python and pip
[Tips] Problems and solutions in the development of python + kivy
I implemented N-Queen in various languages and measured the speed
The story of Python and the story of NaN
I can't remember Python regular expressions
pyenv-change the python version of virtualenv
Change the Python version of Homebrew
I personally compared Java and Ruby
I wrote the queue in Python
When using regular expressions in Python
I wrote the stack in Python
Count the number of Thai and Arabic characters well in Python
Regular expressions that are easy and solid to learn in Python
[Python] I thoroughly explained the theory and implementation of logistic regression
I wrote the code to write the code of Brainf * ck in python
[Python] I thoroughly explained the theory and implementation of decision trees
I compared Node.js and Python in creating thumbnails using AWS Lambda
Get the title and delivery date of Yahoo! News in Python
Note that I understand the algorithm of the machine learning naive Bayes classifier. And I wrote it in Python.
I compared python3 standard argparse and python-fire
Check the behavior of destructor in Python
Differences between Ruby and Python in scope
difference between statements (statements) and expressions (expressions) in Python
[Python] Sweet Is it sweet? About suites and expressions in the official documentation
About the virtual environment of python version 3.7
Comparing the basic grammar of Python and Go in an easy-to-understand manner
Change the saturation and brightness of color specifications like # ff000 in python 2.5
I didn't know the basics of Python
I set the environment variable with Docker and displayed it in Python
The result of installing python in Anaconda
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
[Python] Try pydash of the Python version of lodash