PriorityQueue implementation is quadratic for our purposes - __contains__ does linear search
Opened this issue · 0 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Run best_first_graph_search() which implements frontier as a PriorityQueue
as defined in utils.py
uniform_cost_search() also uses best_first_graph_search() so it demonstrates
the same problem.
What is the expected output? What do you see instead?
I expect the "in" operator of PriorityQueue to operate in O(n) time, as
explained on AIMA page 84: "The data structure for frontier needs to support
efficient membership testing, so it should combine the capabilities of a
priority queue and a hash table."
It is used e.g. in best_first_graph_search: "child not in frontier".
Instead, when the frontier gets big (thousands of Nodes) it runs like a dog.
ipython's %prun is good for testing this, pointing out over ten million
executions of three functions.
ncalls tottime percall cumtime percall filename:lineno(function)
14715773 19.519 0.000 38.611 0.000 utils.py:747(<lambda>)
17112459 18.401 0.000 21.727 0.000 search.py:105(__eq__)
13780 9.214 0.001 47.825 0.003 utils.py:338(some)
17121581 3.334 0.000 3.334 0.000 {isinstance}
4660 2.314 0.000 4.942 0.001 utils.py:748(__getitem__)
The reason is clear when we look at the code and note this member function of
class PriorityQueue:
def __contains__(self, item):
return some(lambda (_, x): x == item, self.A)
What version of the product are you using? On what operating system?
svn revision r201
Please provide any additional information below.
The Python Queue.PriorityQueue class might be a good thing to build on.
Original issue reported on code.google.com by neal...@gmail.com
on 19 Jul 2012 at 5:07