xulunfan/aima-python

PriorityQueue implementation is quadratic for our purposes - __contains__ does linear search

Opened this issue · 0 comments

What steps will reproduce the problem?

1. Run best_first_graph_search() which implements frontier as a  PriorityQueue 
as defined in utils.py

uniform_cost_search() also uses best_first_graph_search() so it demonstrates 
the same problem.

What is the expected output? What do you see instead?

I expect the "in" operator of PriorityQueue to operate in O(n) time, as 
explained on AIMA page 84: "The data structure for frontier needs to support 
efficient membership testing, so it should combine the capabilities of a 
priority queue and a hash table."

It is used e.g. in best_first_graph_search: "child not in frontier".

Instead, when the frontier gets big (thousands of Nodes) it runs like a dog.  
ipython's %prun is good for testing this, pointing out over ten million 
executions of three functions.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 14715773   19.519    0.000   38.611    0.000 utils.py:747(<lambda>)
 17112459   18.401    0.000   21.727    0.000 search.py:105(__eq__)
    13780    9.214    0.001   47.825    0.003 utils.py:338(some)
 17121581    3.334    0.000    3.334    0.000 {isinstance}
     4660    2.314    0.000    4.942    0.001 utils.py:748(__getitem__)

The reason is clear when we look at the code and note this member function of 
class PriorityQueue:

    def __contains__(self, item):
        return some(lambda (_, x): x == item, self.A)

What version of the product are you using? On what operating system?
svn revision r201

Please provide any additional information below.

The Python Queue.PriorityQueue class might be a good thing to build on.

Original issue reported on code.google.com by neal...@gmail.com on 19 Jul 2012 at 5:07