I had to implement a certain process. The process is as follows.
"For each object in the list, remove it from the list if it is inferior to the new object."
What is inferior? However, assuming that the object is represented by a tuple, if the new object is n
and the object originally in the list is x
,
[n[i] > x[i] for i in range(len(n))]
Suppose that the result of is [True, True, ..., True]
(all True).
However, it depends on the problem whether each subscript should have a larger value or a smaller value (that is, the program must deal with it). For example, if the 1st axis is maximized and the 2nd axis is minimized, the conditions for the objects to be removed are as follows.
n[0] > x[0] and n[1] < x[1]
In the following, I will write this process in Python for the time being, and then I would like to improve it step by step.
That's why it is a program that implements the processing explained. Whether it is maximized or minimized on each axis is passed in the argument maximize
(although the variable name is not good enough).
filters.py
def filter00(l, n, maximize):
nl = []
for x in l:
remove = True
for i in range(len(x)):
if maximize[i]:
if n[i] > x[i]:
pass
else:
remove = False
else:
if n[i] < x[i]:
pass
else:
remove = False
if not remove:
nl.append(x)
return nl
Check on the premise of deleting, and if there is an axis for which x
is better, cancel the deletion. I think it's redundant that if is pass and else and remove
is False, but I think that flipping the condition or adding not will reduce readability, so I will leave it as it is (I will improve it anyway)
The following code for the filter00
function,
for i in range(len(x)):
if maximize[i]:
if n[i] > x[i]:
pass
else:
remove = False
else:
if n[i] < x[i]:
pass
else:
remove = False
maximize
is passed as an argument and does not change in the loop. I feel that it is an extra process to make a conditional branch every time. So, let's think about whether we can get ʻif maximize [i]` out.
The method is to make n [i]> x [i]
and n [i] <x [i]
into a lambda expression. Python has an operator module that executes the function corresponding to the operator, so let's rewrite it using it.
filters.py
import operator
def filter01(l, n, maximize):
cond = []
for m in maximize:
cond.append(operator.gt if m else operator.lt)
nl = []
for x in l:
remove = True
for i in range(len(x)):
if cond[i](n[i],x[i]):
pass
else:
remove = False
if not remove:
nl.append(x)
return nl
By the way, I deleted the conditional branch that does not change depending on the loop, but is it really faster? Let's measure properly, not the senses. This time, I made the following measurement script.
perf.py
import filters
import time
import random
import numpy as np
random.seed(123)
def perf(ll, n, m, f):
elapsed = []
for l in ll:
start = time.time()
f(l, n, m)
end = time.time()
elapsed.append(end - start)
print(f.__name__, np.average(elapsed))
LOOP = 100
SIZE = 1000
n = (70, 70)
m = (True, True)
ll = [[(random.randint(0, 99), random.randint(0, 99)) for _ in range(SIZE)] for _ in range(LOOP)]
perf(ll, n, m, filters.filter00)
perf(ll, n, m, filters.filter01)
There is a danger that Masakari will fly if you compare only with the average value, but please do not throw Masakari because the result of the preliminary survey is that the average value is okay (laugh)
Well, execution result
filter00 0.00383123636246 filter01 0.00437139987946
Unexpectedly, it was 5 milliseconds slower. It is possible that the cost of changing the comparison from an operation to a function call outweighed the cost of removing the conditional branch that does not change with the loop.
I wanted to talk about being happy because the code was shorter and the speed was faster, but I stumbled at the beginning. Actually, this improvement was found by speed examination after shortening the code, but since it is possible to keep the code slow, it is better to use the zip function when dealing with multiple lists in a loop. I asked, but I tried what it was like.
filters.py
def filter02(l, n, maximize):
cond = []
for m in maximize:
cond.append(operator.gt if m else operator.lt)
nl = []
for x in l:
remove = True
for c, a, b in zip(cond, n, x):
if c(a, b):
pass
else:
remove = False
if not remove:
nl.append(x)
return nl
Measurement result. The values of filter00 and filter01 are exactly the same as before, but the values measured when the final result of the improvement program was created are given in small quantities. Not bad.
filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483
fast. It's faster than the initial code.
The difference between filter01
and filter02
is whether you refer to cond
, n
, x
with a subscript or receive what the zip function returns. Subscript access is called the __getitem__
method, so it seems that the cost of calling the function is gone. While saying that, even the zip function has to call __getitem__
, but I'm interested in it because it's processed at the C level, but I'm going to improve the code for the time being.
Inclusive notation that everyone loves. It's too clumsy to write a sloppy for loop. Write neatly using the inclusion notation.
filters.py
def filter03(l, n, maximize):
cond = [operator.gt if m else operator.lt for m in maximize]
return [x for x in l if not all([c(a, b) for c, a, b in zip(cond, n, x)])]
Suddenly too much! I'm going to get a tsukkomi, so I will explain it properly. Oh, but before that. First of all, this code is very short in terms of lines, but it's definitely slow. So if you want to know the fast code, skip to improvement # 5.
Now, let's talk about filter03
. This function uses double inclusion notation. The inside explanation first.
[c(a, b) for c, a, b in zip(cond, n, x)]
This part is almost the same as the inner loop written in filter02
. Mostly, in filter02
, conditional branching was done and the value was set in a single variable remove
, but in the above comprehension notation,
[True, True]
The difference is that the list is returned like this.
This is where the annoying problem arises. To decide whether to leave or not, you have to make it a scalar, not a list. Use the all function for that. The all function is a function that returns True if the list (or iterable) is all True. By using this, you can realize the processing that conditional branching with filter02
and setting remove
to False in the case of else. (If you don't lose on any axis, at least one of the lists will be False and the whole will be False)
That's why I knew the inside, so this time it's the outside.
[x for x in l if not all([c(a, b) for c, a, b in zip(cond, n, x)])]
How this works is
l
and assigned to x
x
is used to evaluate the ʻif ...` partx
in the listSo, the outer comprehension corresponds to the code that was handling the filter02
to not append depending on the value of remove
(determined by the inner loop).
Well, the long-awaited execution speed.
filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542
slow. It's been much slower than ever. Actually, the comprehension is executed functionally & it seems that overhead is added due to the appearance of the all function.
Well, there is no speed advantage already, but there is an advantage in the amount of code description, so I will improve the code of improvement 3 a little more. What was difficult to understand with filter03
is the double inclusion notation, so I will use an internal function to make that part a little cooler.
filters.py
def filter04(l, n, maximize):
cond = [operator.gt if m else operator.lt for m in maximize]
def dominated(x, n):
return all([c(a, b) for c, a, b in zip(cond, n, x)])
return [x for x in l if not dominated(x, n)]
By changing the comprehension (and all function) written after ʻif not to an internal function called
dominated`, it became possible to explain what you are doing in one word, and readability has improved.
The execution speed is the slowest as you can imagine.
filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528
Now, let's look back on this.
So, the code is based on these. In addition, filter021
means that it has been modified based on filter02
.
filters.py
def filter021(l, n, maximize):
nl = []
for x in l:
remove = True
for a, b, m in zip(n, x, maximize):
remove = remove and (a > b if m else a < b)
if not remove:
nl.append(x)
return nl
Since the inner loop and operator cannot be used, I made it compact by using the and operator and the conditional operator.
Well, execution time
filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528 filter021 0.00255712985992
Fastest: v:
The story is not over yet. Can't we somehow make it faster? The idea is to change the code depending on whether you use it often or not (general). In other words, even if you say "N variables, maximum and minimum may be mixed", you can write as follows if you say "2 variables, maximize both axes" in most uses.
filters.py
def filter02X(l, n, maximize):
if len(n) == 2 and maximize == (True, True):
return [x for x in l if not (n[0] > x[0] and n[1] > x[1])]
else:
return filter021(l, n, maximize)
Measurement result
filter00 0.00383123636246 filter01 0.00437139987946 filter02 0.0029260468483 filter03 0.00512305736542 filter04 0.00578190803528 filter021 0.00255712985992 filter02X 0.000690214633942
It's an extraordinary speed: v :: v: In the case of 2 variables and 3 variables in the language processing system implementation, it means that it is more than that.
In this article, we've gradually improved the solid program. The initial idea was to make the code that is not Python-like (the code that people who have learned other languages tend to write when writing programs in Python) become Python-like, which makes it more readable and faster and makes them happy. I wanted to, but when I actually measured it, I found that it wasn't so, so I changed the story a little. In summary, there are the following three points.
It's been 3 years since the article was posted. At that time, the power of numpy was still low, and I thought, "The speed improvement I wrote in that article, now I can use numpy to make it even faster", but I haven't touched it. On the other hand, when I maintained the program that was the source of this article for the first time in a while, I was disappointed with the slowness and made it faster using numpy.
There is something in filter021
that numpy ○ chi cannot forgive.
Yes, it's a for statement.
That's why I erased it.
filters.py
def filter05(l, n, maximize):
cond = [np.greater if m else np.less for m in maximize]
npa = np.array(l)
check = np.empty(npa.shape, dtype=np.bool)
for i, c in enumerate(cond):
check[:, i] = c(n[i], npa[:, i])
#Elements that lose all axes to n (conditional expressions are all True) are remove, otherwise not remove
not_remove = (check.sum(axis=1) != len(n))
return [x for x, nr in zip(l, not_remove) if nr]
I can't erase it! There is a reason for the above code to be an excuse.
――Since it is different for each axis whether it is maximized or minimized (with my numpy power), it cannot be written with one conditional expression. --The return value must be list. Also, in the original material program, the element of list is annoying as "an object of a class that inherits list", and not_remove cannot be applied directly to npa (using boolean index) [^ 1]
[^ 1]: It doesn't meet the specifications, but I tried using boolean index, but it didn't speed up.
Let's measure the time anyway.
filter00 0.0037975716590881348 filter01 0.004087417125701904 filter02 0.003038032054901123 filter03 0.004926862716674804 filter04 0.005476489067077637 filter021 0.002838168144226074 filter02X 0.0006895852088928222 filter05 0.002588319778442383
It's an error. The reason it doesn't get faster seems to be the overhead of converting list to numpy. If len (l) >> len (n), the for statement that compares each axis can be ignored.
As mentioned above, the original program cannot assume a numpy array as input, so it cannot be made any faster, but I tried to see how fast it would be if I / O could be a numpy array.
filters.py
def filter06(a, n, maximize):
cond = [np.greater if m else np.less for m in maximize]
check = np.empty(a.shape, dtype=np.bool)
for i, c in enumerate(cond):
check[:, i] = c(n[i], a[:, i])
not_remove = (check.sum(axis=1) != len(n))
return a[not_remove]
Make it a numpy array before applying it to the measurement.
perf.py
a = np.array(ll)
perf(a, n, m, filters.filter06)
result. After all it is fast if it can be completed only with numpy.
filter00 0.0037975716590881348 filter01 0.004087417125701904 filter02 0.003038032054901123 filter03 0.004926862716674804 filter04 0.005476489067077637 filter021 0.002838168144226074 filter02X 0.0006895852088928222 filter05 0.002588319778442383 filter06 0.00030982017517089844
Or rather, the specialized version (filter02X
) is why it's so fast. By the way, I made a specialized version of filter05
and filter06
and measured it, but the result was almost the same (rather slower).
I wondered if there was a way to do it all at once by comparing each axis of filter06
, but I couldn't eliminate the comparison by axis, but I came up with a way to delete the sum after for.
filers.py
def filter061(a, n, maximize):
cond = [np.greater if m else np.less for m in maximize]
check = np.ones(a.shape[0], dtype=np.bool)
for i, c in enumerate(cond):
check = check & c(n[i], a[:, i])
not_remove = ~check
return a[not_remove]
It will be about twice as fast when measured. I also applied the same improvement to filter05
, but the conversion overhead was still larger.
filter06 0.0003197813034057617 filter061 0.00017987966537475587
If you invert the comparison operation itself and eliminate the last ~
, it will be 10 to 20% faster, but readability will decrease.
filters.py
def filter062(a, n, maximize):
cond = [np.less if m else np.greater for m in maximize]
not_remove = np.zeros(a.shape[0], dtype=np.bool)
for i, c in enumerate(cond):
not_remove = not_remove | c(n[i], a[:, i])
return a[not_remove]