I compared Python's iterator and Ruby's Enumerator

Measurement environment

$ uname -a
Linux kubo39 3.2.0-51-generic-pae #77-Ubuntu SMP Wed Jul 24 20:40:32 UTC 2013 i686 i686 i386 GNU/Linux
$ cat /proc/cpuinfo | grep "model name"
model name	: Intel(R) Core(TM) i7-3517U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-3517U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-3517U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-3517U CPU @ 1.90GHz
$ cat /proc/meminfo | grep MemTotal
MemTotal:        4011464 kB

Version of each language

Cost to call the next element

--Python code

def test_call_next(n=100001):
    iter = range(0, n).__iter__()
    while True:
        try:
            iter.next()
        except StopIteration:
            break

Execution result

$ time python iter.py 

real	0m0.042s
user	0m0.028s
sys	0m0.012s
$ time python iter.py 

real	0m0.046s
user	0m0.044s
sys	0m0.004s
$ time python iter.py 

real	0m0.036s
user	0m0.028s
sys	0m0.004s

--Ruby code

def test_call_next n=100000
  iter = [*0..n].each
  loop do
    iter.next
  end
end

Execution result

$ time ruby iter.rb 

real	0m0.138s
user	0m0.096s
sys	0m0.040s
$ time ruby iter.rb 

real	0m0.145s
user	0m0.116s
sys	0m0.028s
$ time ruby iter.rb 

real	0m0.147s
user	0m0.124s
sys	0m0.020s

Comparison

It's about 3.5 times faster than Python, but it may not be a good bench because it takes into account the cost of iterator generation.

The reason why sys is large in Ruby

$ strace -c ruby iter.rb 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.14    0.006114           0    200005           sigprocmask
  1.86    0.000116           1        85           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0       200       146 open
  0.00    0.000000           0        55           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           time
  0.00    0.000000           0         8         8 access
  0.00    0.000000           0        27           brk
  0.00    0.000000           0        25        22 ioctl
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         7           munmap
  0.00    0.000000           0         1           clone
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        12           mprotect
  0.00    0.000000           0         9           _llseek
  0.00    0.000000           0         1           mremap
  0.00    0.000000           0        16           rt_sigaction
  0.00    0.000000           0        23           rt_sigprocmask
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0         1           sigaltstack
  0.00    0.000000           0         6           getrlimit
  0.00    0.000000           0        37           mmap2
  0.00    0.000000           0        37        15 stat64
  0.00    0.000000           0        96           lstat64
  0.00    0.000000           0       117           fstat64
  0.00    0.000000           0        14           getuid32
  0.00    0.000000           0        14           getgid32
  0.00    0.000000           0        15           geteuid32
  0.00    0.000000           0        15           getegid32
  0.00    0.000000           0         2           getdents64
  0.00    0.000000           0        46           fcntl64
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         5           sched_getaffinity
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         2           clock_gettime
  0.00    0.000000           0         1           openat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         2           pipe2
------ ----------- ----------- --------- --------- ----------------
100.00    0.006230                200895       192 total

It seems that it is because it calls sigprocmask (2) every time.

Iterator (Enumerator) generation cost

--Python code

def test_create_iterator(n=10001):
    [range(0, 1001).__iter__ for _ in xrange(n)]

Execution result

$ time python iter.py 

real	0m0.328s
user	0m0.280s
sys	0m0.044s
$ time python iter.py 

real	0m0.342s
user	0m0.276s
sys	0m0.064s
$ time python iter.py 

real	0m0.324s
user	0m0.268s
sys	0m0.052s

--Ruby code

def test_create_enum n=10000
  n.times{ [*0..1001].to_enum }
end

Execution result

$ time ruby iter.rb 

real	0m0.554s
user	0m0.548s
sys	0m0.004s
$ time ruby iter.rb 

real	0m0.558s
user	0m0.552s
sys	0m0.004s
$ time ruby iter.rb 

real	0m0.566s
user	0m0.560s
sys	0m0.000s

Comparison

Again, Python is about 1.7 times faster.

I'm curious that Python has a large sys time.

$ strace -c python iter.py 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 53.85    0.000049           0       337       250 open
 46.15    0.000042           0      1292           brk
  0.00    0.000000           0       183           read
  0.00    0.000000           0        89           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0        11        11 access
  0.00    0.000000           0         5         1 ioctl
  0.00    0.000000           0         4         2 readlink
  0.00    0.000000           0        55           munmap
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        11           mprotect
  0.00    0.000000           0         3           _llseek
  0.00    0.000000           0        68           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0        86           mmap2
  0.00    0.000000           0       172        96 stat64
  0.00    0.000000           0         9           lstat64
  0.00    0.000000           0       141           fstat64
  0.00    0.000000           0         1           getuid32
  0.00    0.000000           0         1           getgid32
  0.00    0.000000           0         1           geteuid32
  0.00    0.000000           0         1           getegid32
  0.00    0.000000           0         4           getdents64
  0.00    0.000000           0         1         1 futex
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         2           openat
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.000091                  2485       361 total

ʻOpen (2)andbrk (2)are spending a lot of time, especially the number of calls tobrk (2)`.

By the way, brk (2) is a system call to change the amount of memory allocated to a process's data segment.

The heap size is not enough when mallocing && The process is called when there is enough memory available, so it is actually I think many people use it without knowing it.

bonus

Compare with an abstraction of the code you really needed

--Python code

def test_for_generate_enumerator(n=50001):
    arr = range(0, 11)
    for i in xrange(0, n):
        iter = arr.__iter__()
        while True:
            try:
                iter.next()
            except StopIteration:
                break

Execution result

$ time python iter.py 

real	0m0.134s
user	0m0.128s
sys	0m0.004s
$ time python iter.py 

real	0m0.134s
user	0m0.128s
sys	0m0.004s
$ time python iter.py 

real	0m0.142s
user	0m0.132s
sys	0m0.008s

--Ruby code

def test_for_iter_with_generate_enumerator n=50000
  arr = [*0..10]
  n.times {
    iter = arr.to_enum
    loop do
      iter.next
    end
  }
end

Execution result

$ time ruby iter.rb 

real	0m1.370s
user	0m1.080s
sys	0m0.288s
$ time ruby iter.rb 

real	0m1.377s
user	0m0.992s
sys	0m0.380s
$ time ruby iter.rb 

real	0m1.362s
user	0m1.060s
sys	0m0.296s

Comparison

Ruby is extremely slow like this ...

But it's strange that Python's sys time is smaller in this code than when it generated a lot of iterators.

Conclusion

Apparently, the Python iterator is faster for both generation and next element call.

Next time (if any), I'd like to actually follow the processing code.

Recommended Posts

I compared Python's iterator and Ruby's Enumerator
I compared Java and Python!
I compared blade and jinja2
I compared Qiskit and Blueqat (beginner)
I personally compared Java and Ruby
I compared "python dictionary type" and "excel function"
I like Python's comprehension, so I compared it with map
I compared Python more-itertools 2.5 → 2.6