Sort of tuple array can be accelerated by specifying key (Python)

Introduction

When I was worried about "TLE even though it seems to be in time for the calculation amount ..." while devoting myself to the competition pro, it was discovered that the sort part of the preparation was a bottleneck. Share the points you're addicted to.

tuple array sort

I will briefly explain the difference between with and without key specification.

key specified

a = [(3,3), (2,2), (3,1), (4,2)]
a.sort(key=lambda x: x[0])
print(a) # => [(2, 2), (3, 3), (3, 1), (4, 2)]

Sort is performed according to the size of the specified key. If the key elements are equal, the order is preserved.

The sort () method is guaranteed to be stable. Sorting is stable as long as it is guaranteed that the relative order of equal elements will not change. (Official documentation)

(x[0],x[1]))It is possible to specify multiple keys as tuples like


 In this article, we will consider specifying a single element as a key.



## No key specified
 If you do not specify any key
 "Compare size with the 0th element of tuple"-> "Compare size with the 1st element if they are the same"-> "..."
 Sort is done in this way.

```python
a = [(3,3), (2,2), (3,1), (4,2)]
a.sort()
print(a) # => [(2, 2), (3, 1), (3, 3), (4, 2)]

At this time, it seems that the comparison operation between tuples is performed internally.

key specifies a function that takes one argument and is used to retrieve the comparison key from each element of the list (for example, key = str.lower). The key corresponding to each item is calculated once and used for the entire sorting process. ** The default value None means that the values in the list are sorted directly without calculating another key value. ** (Official documentation)

If you just "sort by the first element of tuple", you can achieve the purpose with or without key specification. If you don't specify the key here, thinking ** "Then you don't need to specify the key" **, you may fall into an unexpected trap. ** Comparison between tuples is slow. ** **

Experiment

"Comparison between tuples" vs. "Comparison between tuple elements"

Check how much speed difference there is between the two. The execution environment is AtCoder's code test (PyPy3). There are 3 elements in each tuple.

import random
random.seed(1)

''' n:3 ways
N = 1*(10**3)
N = 3*(10**3)
N = 5*(10**3)
'''

tl = [] # list of tuples
for i in range(N):
    l = random.randint(1,N)
    r = random.randint(1,N)
    tl.append((l,r,i)) # 3 elements in tuple

''' 
# 1.Comparison between tuples
for i in range(N):
    for j in range(N):
        res = tl[i] < tl[j]
# 2.Comparison of tuple elements
for i in range(N):
    for j in range(N):
        res = tl[i][0] < tl[j][0]
'''

The measurement results are as follows.

N 1.tuples 2.Tuple elements
1*10^3 78 ms 17 ms
3*10^3 672 ms 129 ms
5*10^3 1875 ms 368 ms

Since there are three tuple elements, I thought, "Is it an error of about 3 times at most?", But there was a big difference. scared.

tuple array sort "key not specified" vs. "key specified"

import random
random.seed(1)

''' n:3 ways
N = 1*(10**5)
N = 3*(10**5)
N = 5*(10**5)
'''

tl = [] # list of tuples
for i in range(N):
    l = random.randint(1,N)
    r = random.randint(1,N)
    tl.append((l,r,i)) # 3 elements in tuple

'''
3. tl.sort()
4. tl.sort(key=lambda x:x[0])
'''

The measurement results are as follows.

N 3.No key 4.With key
1*10^5 198 ms 98 ms
3*10^5 735 ms 359 ms
5*10^5 1361 ms 708 ms

Although not as much as in the first experiment, the difference was about twice as large.

in conclusion

It may not be a big difference, but depending on the problem, this can be addictive. Some people may have seen the code of the experiment and wondered, "Is that the problem?" By the way, it seems that `` `itemgetteris faster than lambda``` for key specification.

Recommended Posts

Sort of tuple array can be accelerated by specifying key (Python)
Sort the elements of the array by specifying the conditions
Sort tuple list in Python by specifying the ascending / descending order of multiple keys
Investigation of DC power supplies that can be controlled by Python
How to sort by specifying a column in the Python Numpy array.
[python] How to sort by the Nth Mth element of a multidimensional array
Value sort of dictionary with complicated structure (sort by key structure by deep value)
[Python3] Call by dynamically specifying the keyword argument of the function
Sort by date in python
[Python] Tuple version of prefecture pull-down
[Python] Sort iterable by multiple conditions
Expansion by argument of python dictionary
"Obake" can be sung by "you"
Behavior of python3 by Sakura's server
Sort by specifying conditions in CASTable
Story of power approximation by Python
[Python] A program that finds a pair that can be divided by a specified value
[Python memo] Be careful when creating a two-dimensional array (list of lists)