[Python] Let's reduce the number of elements in the result in set operations

Overview

When performing a set operation, the execution speed can be improved by reducing the number of elements of the obtained result. Since the operation result is returned in a new set object, it takes time to create the object if the number of elements is large.

background

In ABC157 D Friend Suggestions, when I set len (XYZ) to calculate the number of elements in a set, it became TLE. AC was done with len (X) -len (X & (Y | Z)) `. I tried to verify why the speed is different.

|X|When is large

On the premise of the problem|X|,|Y|,|Z| \leq 10^5But this time I actually encountered it|X| \gg |Y|,|Z|Let's measure the condition of.

from timeit import timeit

xyz = 'X=set(range(10**5)); Y=set(range(10)); Z=set(range(5,15))' 

timeit('len(X-Y-Z)', setup=xyz, number=100)
# 0.3884289239649661

timeit('len(X)-len(X&(Y|Z))', setup=xyz, number=100)
# 0.0001103340182453394

Although the results are the same, there is a 3520-fold difference in execution time.

When the contents of X, Y, Z are the same

Next, let X, Y, Z all be the same element of $ 10 ^ 5 $.

from timeit import timeit

xyz = 'X=set(range(10**5)); Y=set(range(10**5)); Z=set(range(10**5))'

timeit('len(X-Y-Z)', setup=xyz, number=100)
# 0.28364974400028586

timeit('len(X)-len(X&(Y|Z))', setup=xyz, number=100)
# 1.1718004010035656

Next timelen(X)-len(X&(Y|Z))Was slower.X&(Y|Z)Is the same as the original set, and it is considered that the number of elements in the result has increased. On the other hand, len (X-Y-Z) was shortened to about 1/3, probably because an empty set was obtained at the stage of X-Y.

Difference set vs intersection set

Simplify the problem and compare only the difference and product operations. The other side of the operation is an empty set, and the left and right sides are exchanged.

from timeit import timeit

xy = 'X=set(range(10**5)); Y=set()'

timeit('X-Y', setup=xy, number=100)
# 0.16930873499950394
timeit('Y-X', setup=xy, number=100)
# 1.7047044821083546e-05

timeit('X&Y', setup=xy, number=100)
# 1.0746996849775314e-05
timeit('Y&X', setup=xy, number=100)
# 1.502997474744916e-05

Even the difference set is fast when the number of elements in the result is small. Apparently, the difference in speed is not the content of the calculation.

Generation of set

See the Python documentation set.difference

Returns a new set with elements that are included in set and not included in all other.

a. Therefore, let's measure the generation time of a set with a large number of elements.

from timeit import timeit

timeit('set(X)', setup='X=set(range(10**5))', number=100)
# 0.16229172004386783

After all, when the number of elements in the result is large, it only takes time to generate the set to return.

Recommended Posts

[Python] Let's reduce the number of elements in the result in set operations
Get the size (number of elements) of UnionFind in Python
Get the number of specific elements in a python list
The result of installing python in Anaconda
Output the number of CPU cores in Python
View the result of geometry processing in Python
Set the number of elements in a NumPy one-dimensional array to a power of 2 (0 padded)
Set an upper limit on the number of recursive function iterations in Python
Let's use the open data of "Mamebus" in Python
[Python] Outputs all combinations of elements in the list
Measure the execution result of the program in C ++, Java, Python.
The result of Java engineers learning machine learning in Python www
Python --Find out number of groups in the regex expression
[Homology] Count the number of holes in data with Python
Count the number of Thai and Arabic characters well in Python
Get the number of readers of a treatise on Mendeley in Python
Let's see how to count the number of elements in an array in some languages [Go, JavaScript, PHP, Python, Ruby, Swift]
Check the behavior of destructor in Python
Associate the table set in python models.py
Let's claim the possibility of pyenv-virtualenv in 2021
The basics of running NoxPlayer in Python
In search of the fastest FizzBuzz in Python
Set the process name of the Python program
Project Euler # 17 "Number of Characters" in Python
[Python] Combine all the elements in the array
[Python3] Understand the basics of file operations
Check the in-memory bytes of a floating point number float in Python
[Python] Calculate the number of digits required when filling in 0s [Note]
I want to batch convert the result of "string" .split () in Python
[python] Check the elements of the list all, any
[Python] Sort the list of pathlib.Path in natural sort
Let's parse the git commit log in Python!
Get the caller of a function in Python
Match the distribution of each group in Python
Calculation result after the decimal point in Python
Calculate the total number of combinations with python
Make a copy of the list in Python
Find the number of days in a month
Rewriting elements in a loop of lists (Python)
Find the divisor of the value entered in python
Find the solution of the nth-order equation in python
The story of reading HSPICE data in Python
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
Solving the equation of motion in Python (odeint)
Output in the form of a python array
Summary of Excel operations using OpenPyXL in Python
How to pass the execution result of a shell command in a list in Python
Let's automatically display the lyrics of the song being played on iTunes in Python
Divides the character string by the specified number of characters. In Ruby and Python.
How to count the number of elements in Django and output to a template
Visualize the timeline of the number of issues on GitHub assigned to you in Python
[Python] Precautions when finding the maximum and minimum values in a numpy array with a small number of elements
Check if the string is a number in python
python beginners tried to predict the number of criminals
File operations in Python
Experience the good calculation efficiency of vectorization in Python
[Python] A program that counts the number of valleys
Sort in Python. Next, let's think about the algorithm.
Count the number of parameters in the deep learning model
How to identify the element with the smallest number of characters in a Python list?