Overview

Play by creating a priority queue that can update the element. To be precise, pushing the same element with different priorities creates a priority queue that overrides the priorities.

Synopsis

Due to the algorithm quick reference, I needed to use the priority queue, but I searched for "priority queue Python" and it appears at the top. http://docs.python.jp/2/library/heapq.html When I looked at it, it said something terrible.

Priority queues are a common use of heaps and have some difficulties in implementation:

--Sort stability: How can two tasks of equal priority be returned in the order they were originally added? --In future Python 3, tuple comparisons will be split into (priority, task) pairs if the priorities are equal and the task does not have a default comparison order. --If a task's priority changes, how do you move it to a new location on the heap? --When an unresolved task needs to be deleted, how do you find it in the queue and delete it?

The solution to the first two difficulties is to save the item as a three-element list that contains the priority, item number, and task. This item number acts as a tie-breaker to ensure that two tasks of the same priority are returned in the order in which they were added. And since the two item numbers are never equal, the tuple comparison cannot try to compare the two tasks directly. The remaining difficulty is primarily to find open tasks, change their priority, or remove them altogether. Finding a task is done by a dictionary that points to items in the queue. Deleting items or changing their priorities is more difficult as it breaks the immutable relationships of the heap structure. So a possible solution is to mark the item as invalid and add the changed priority item if necessary:

Well, I know what you're saying. I wondered if Python doesn't provide priority queues by default because heapq says something like this. However, that is not the case, and there is a proper Queue.PriorityQueue (). However, although there is, it seems that it is not possible to immediately perform the update process using this PriorityQueue (). Also, considering what is written on this page, I was curious about how much the calculation speed would be better or worse ~~ I don't care ~~.

design

A hand-made priority queue (hereinafter referred to as "hand queue") consists of a list of tuples (sortkey, (value, generation)) and a dictionary that holds one generation for each value. Deletion is an image that is not actually performed as described above, but is flagged, but it is controlled by saving the latest generation number in the dictionary and not using the latest data. I will. In addition, since the search performance deteriorates when elements are accumulated, data deletion + heap reconstruction shall be performed in a garbage collection manner. Regarding the timing of "garbage collection" here, I decided to keep it as an appropriate constant m, and reconstruct it when the number of elements (the number of tuples with the outermost parentheses counted as 1) exceeds m. And so on.

In this "hand cue" design,

--push → heapq.heappush () to update the dictionary, and if the size exceeds m, heap again with only the latest generation

Since the generation of the dictionary is updated, it also serves as a change as a result. --pop → keep taking heapq.heappop () until the latest generation is taken out (extract generation from dictionary based on value and repeat heappop () if they do not match) -* Delete → Update only the dictionary

I will do it like that. Random access to the elements of the queue is something we don't care about for now. If you give the dictionary a value, it will use more memory, but in theory you can access it randomly.

Implementation

`TeQueue.py`


import heapq
def tepush(l, d, m, sortkey, value):
    if not value in d:
        generation = 1
    else:
        generation = d[value] + 1
    d[value] = generation
    heapq.heappush(l, (sortkey, value, generation))
    length = len(l)
    if length > m: #Heap reconstruction
        lt = []
        for n in range(0,length):
            if l[n][2] == d[l[n][1]]:
                heapq.heappush(lt, (l[n][0], l[n][1], 1))
                d[l[n][1]] = 1
        return lt
    else:
        return l

def tepop(l, d):
    while len(l) > 0:
        (sortkey, value, generation) = heapq.heappop(l)
        if d[value] == generation:
            return (sortkey, value, generation)

This makes it possible to "overwrite" the value corresponding to the same value as described in the design.

Comparison with other queues

Compare with Queue.PriorityQueue ().

Initialization

For the time being, perform the following initialization.

`initialize.py`


#Generate a random number sequence of N
import random
N = 100000
M = 100000
target = range(0,N)
temp = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = temp.pop(random.randint(0,n))

push

`pushtest.py`


import Queue
import heapq
import datetime

d = datetime.datetime(1,1,1)
time1 = d.now()
t = Queue.PriorityQueue(N)
for n in range(0,N):
    t.put((target[n], n))
time2 = d.now()
u = []
for n in range(0,N):
    heapq.heappush(u, (target[n], n))
time3 = d.now()
v = []
dd = {}
for n in range(0,N):
    v = tepush(v, dd, 2*M, target[n], n)
time4 = d.now()
print('push:Queue',(time2 - time1).__str__())
print('push:Heap',(time3 - time2).__str__())
print('push:TeQ',(time4 - time3).__str__())

`push result`


('push:Queue', '0:00:00.350000')
('push:Heap', '0:00:00.086000')
('push:TeQ', '0:00:00.137000')

It looks like this. TeQ> Heap because there are many comparisons, but it is much faster than Queue.

pop

`poptest.py`


import Queue
import heapq
import datetime

time1 = d.now()
for n in range(0,M):
    t.get()
time2 = d.now()
for n in range(0,M):
    heapq.heappop(u)
time3 = d.now()
for n in range(0,M):
    tepop(v, dd)
time4 = d.now()
print('pop:Queue',(time2 - time1).__str__())
print('pop:Heap',(time3 - time2).__str__())
print('pop:TeQ',(time4 - time3).__str__())

`pop result`


('pop:Queue', '0:00:00.484000')
('pop:Heap', '0:00:00.214000')
('pop:TeQ', '0:00:00.299000')

It looks like this. It's less different than push, but it's still the same trend. Actually, I wanted to play with M by changing the degree of duplication, but I realized that it doesn't make much sense for Queue and heapq, so I decided to make a comparison in the hand cue. When comparing with other than hand queue, http://stackoverflow.com/questions/9969236/how-to-implement-priority-queues-in-python It seemed good to compare it with something, but it was subordinated.

Comparison of M coefficient and speed when the multiplicity is changed

Experiment A

Initialization

`initializeA.py`


#Generate a random number sequence of N
import random
N = 1000000
M = 40000
target = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = random.randint(0,M-1)

Implementation results

`testA.py`


d = datetime.datetime(1,1,1)
for m in [2,5,10,20]:
    time1 = d.now()
    v = []
    dd = {}
    for n in range(0,N):
        v = tepush(v, dd, m*M, n, target[n])
    time2 = d.now()
    for n in range(0,M):
        tepop(v, dd)
    time3 = d.now()
    print('push:TeQ_' + str(m),(time2 - time1).__str__())
    print('pop:TeQ_' + str(m),(time3 - time2).__str__())

`Result A`


('push:TeQ_2', '0:00:02.731000')
('pop:TeQ_2', '0:00:00.193000')
('push:TeQ_5', '0:00:01.794000')
('pop:TeQ_5', '0:00:00.527000')
('push:TeQ_10', '0:00:01.667000')
('pop:TeQ_10', '0:00:00.711000')
('push:TeQ_20', '0:00:01.605000')
('pop:TeQ_20', '0:00:00.581000')

The reason why 20 pops faster is that 20 * 40000 and 10 * 40000 overlap, so it ends up in the same state. .. .. HM.

Experiment B

Initialization

`initializeB.py`


#Generate a random number sequence of N
import random
N = 1000000
M = 4000
target = range(0,N)
for n in range(N-1,-1,-1):
  target[n] = random.randint(0,M-1)

Implementation results

`testB.py`


d = datetime.datetime(1,1,1)
for m in [2,5,10,20]:
    time1 = d.now()
    v = []
    dd = {}
    for n in range(0,N):
        v = tepush(v, dd, m*M, n, target[n])
    time2 = d.now()
    for n in range(0,M):
        tepop(v, dd)
    time3 = d.now()
    print('push:TeQ_' + str(m),(time2 - time1).__str__())
    print('pop:TeQ_' + str(m),(time3 - time2).__str__())

`Result B`


('push:TeQ_2', '0:00:02.927000')
('pop:TeQ_2', '0:00:00.013000')
('push:TeQ_5', '0:00:02.035000')
('pop:TeQ_5', '0:00:00.018000')
('push:TeQ_10', '0:00:01.929000')
('pop:TeQ_10', '0:00:00.093000')
('push:TeQ_20', '0:00:01.846000')
('pop:TeQ_20', '0:00:00.026000')

Even after trying several times, this tendency does not change much. Why. Even if the order is changed, 20 remains overwhelmingly faster than 10. HM. .. ..

Summary

I feel like I don't have to do the equivalent of garbage collection or reindexing.

[GO] I want to create a priority queue that can be updated in Python (2.7)

Overview

Synopsis

design

Implementation

TeQueue.py

Comparison with other queues

Initialization

initialize.py

pushtest.py

push result

poptest.py

pop result

Comparison of M coefficient and speed when the multiplicity is changed

Experiment A

Initialization

initializeA.py

Implementation results

testA.py

Result A

Experiment B

Initialization

initializeB.py

Implementation results

testB.py

Result B

Summary

`TeQueue.py`

`initialize.py`

`pushtest.py`

`push result`

`poptest.py`

`pop result`

`initializeA.py`

`testA.py`

`Result A`

`initializeB.py`

`testB.py`

`Result B`