In Python 2.6.6, I compared the execution times of the following 1,2,3 methods to append 10,000,000 integers. In conclusion, the list comprehension of 3 was as fast as expected. As I wrote in the postscript, the contents described here can also be applied to Python 2.7 series and 3.3 series.
# 1. testfunc1:Prepare an empty list and append
def testfunc1(rangelist):
templist = []
for temp in rangelist:
templist.append(temp)
# 2. testfunc2: 1+Objectify append
def testfunc2(rangelist):
templist = []
append = templist.append
for temp in rangelist:
append(temp)
# 3. testfunc3:List comprehension
def testfunc3(rangelist):
templist = [temp for temp in rangelist]
Use IPython's% timeit command to execute "Extract the fastest of 3 runs" 10 times to find the average run time.
# 10,000,Create a list of 000 integers
>>> rangelist = range(1,10000000)
>>> %timeit -n 10 -r 3 testfunc1(rangelist)
10 loops, best of 3: 1.73 s per loop
>>> %timeit -n 10 -r 3 testfunc2(rangelist)
10 loops, best of 3: 1.08 s per loop
>>> %timeit -n 10 -r 3 testfunc3(rangelist)
10 loops, best of 3: 697 ms per loop
** In a loop where execution time is a concern, it is better to use list comprehension as in 3. ** ** ** Even if you don't use list comprehension, it is better to put the append method reference in the variable in advance as in 2. ** **
Note that the second method of putting the method reference in a variable in advance can also be used to shorten the loop execution time other than list creation.
Let's examine why this result is obtained by disassembling each function using Python's dis module.
>>> import dis
>>> dis.dis(testfunc1)
3 0 BUILD_LIST 0
3 STORE_FAST 1 (templist)
4 6 SETUP_LOOP 27 (to 36)
9 LOAD_FAST 0 (rangelist)
12 GET_ITER
>> 13 FOR_ITER 19 (to 35)
16 STORE_FAST 2 (temp)
5 19 LOAD_FAST 1 (templist)
22 LOAD_ATTR 0 (append)
25 LOAD_FAST 2 (temp)
28 CALL_FUNCTION 1
31 POP_TOP
32 JUMP_ABSOLUTE 13
>> 35 POP_BLOCK
>> 36 LOAD_CONST 0 (None)
39 RETURN_VALUE
>>> dis.dis(testfunc2)
9 0 BUILD_LIST 0
3 STORE_FAST 1 (templist)
10 6 LOAD_FAST 1 (templist)
9 LOAD_ATTR 0 (append)
12 STORE_FAST 2 (append)
11 15 SETUP_LOOP 24 (to 42)
18 LOAD_FAST 0 (rangelist)
21 GET_ITER
>> 22 FOR_ITER 16 (to 41)
25 STORE_FAST 3 (temp)
12 28 LOAD_FAST 2 (append)
31 LOAD_FAST 3 (temp)
34 CALL_FUNCTION 1
37 POP_TOP
38 JUMP_ABSOLUTE 22
>> 41 POP_BLOCK
>> 42 LOAD_CONST 0 (None)
45 RETURN_VALUE
>>> dis.dis(testfunc3)
16 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_FAST 1 (_[1])
7 LOAD_FAST 0 (rangelist)
10 GET_ITER
>> 11 FOR_ITER 13 (to 27)
14 STORE_FAST 2 (temp)
17 LOAD_FAST 1 (_[1])
20 LOAD_FAST 2 (temp)
23 LIST_APPEND
24 JUMP_ABSOLUTE 11
>> 27 DELETE_FAST 1 (_[1])
30 STORE_FAST 3 (templist)
33 LOAD_CONST 0 (None)
36 RETURN_VALUE
Of the above output, FOR_ITER to JUMP_ABSOLUTE is a loop called 10,000,000 times.
In testfunc1, you can see that 22 LOAD_ATTR calls the append method of the list object and then 28 CALL_FUNCTION actually appends.
testfunc1 for loop
>> 13 FOR_ITER 19 (to 35)
16 STORE_FAST 2 (temp)
5 19 LOAD_FAST 1 (templist)
22 LOAD_ATTR 0 (append)
25 LOAD_FAST 2 (temp)
28 CALL_FUNCTION 1
31 POP_TOP
32 JUMP_ABSOLUTE 13
On the other hand, in testfunc2, LOAD_ATTR is out of the loop, inside the loop we call the append method with 28 LOAD_FAST and then execute append with 34 CALL_FUNCTION.
testfunc2 for loop
>> 22 FOR_ITER 16 (to 41)
25 STORE_FAST 3 (temp)
12 28 LOAD_FAST 2 (append)
31 LOAD_FAST 3 (temp)
34 CALL_FUNCTION 1
37 POP_TOP
38 JUMP_ABSOLUTE 22
LOAD_ATTR and LOAD_FAST take longer than LOAD_ATTR. Therefore, testfunc2 with LOAD_ATTR outside the loop has a shorter execution time than testfunc1 with LOAD_ATTR inside the loop.
In testfunc2, I loaded the append method with 28 LOAD_FAST and then called it with 34 CALL_FUNCTION. On the other hand, testfunc3 calls 23 LIST_APPEND, a list-only process, and there is no CALL_FUNCTION.
testfunc3 for loop
>> 11 FOR_ITER 13 (to 27)
14 STORE_FAST 2 (temp)
17 LOAD_FAST 1 (_[1])
20 LOAD_FAST 2 (temp)
23 LIST_APPEND
24 JUMP_ABSOLUTE 11
For CALL_FUNCTION and LIST_APPEND, calling LIST_APPEND, which is a list-only process, is faster (CALL_FUNCTION has various processes because it is a general function call). Therefore, testfunc3 with CALL_FUNCTION replaced by LIST_APPEND has a shorter execution time than testfunc2 with CALL_FUNCTION in the loop.
This article was written with reference to the following articles.
I also compared Python 2.7 and 3.3 to make sure that the above isn't just about Python 2.6. Since I changed the machine, there is no point in comparing it with the above execution time.
Python 2.7.4
>>> %timeit -n 5 testfunc1()
5 loops, best of 3: 974 ms per loop
>>> %timeit -n 5 testfunc2()
5 loops, best of 3: 737 ms per loop
>>> %timeit -n 5 testfunc3()
5 loops, best of 3: 430 ms per loop
Python 3.3.3
>>> %timeit -n 5 testfunc1()
5 loops, best of 3: 1.21 s per loop
>>> %timeit -n 5 testfunc2()
5 loops, best of 3: 815 ms per loop
>>> %timeit -n 5 testfunc3()
5 loops, best of 3: 639 ms per loop
Well, Python3 is slower ... By the way, in Python3, it seems that the range function returns an iterator instead of a list, so let's create and execute a list without using the range function.
>>> %timeit -n 5 testfunc1()
5 loops, best of 3: 1.19 s per loop
>>> %timeit -n 5 testfunc2()
5 loops, best of 3: 611 ms per loop
>>> %timeit -n 5 testfunc3()
5 loops, best of 3: 429 ms per loop
I'm still suspicious about testfunc1, but now I can run testfunc2 and testfunc3 in about the same amount of time as Python 2.7.4.
The same can be said for Python 2.7 and Python 3.3 for 1, 2 and 3 as for Python 2.6.
Recommended Posts