sagemath/sage

standardize the interface to TransitiveIdeal and friends

nthiery opened this issue · 80 comments

  1. Implement a single entry point for recursively enumerated sets:
     RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)

where structure takes values in the set [None, 'forest', 'graded', 'symmetric'] and enumeration takes values in the set [None, 'depth', 'breadth', 'naive'].

  1. Deprecate TransitiveIdeal, TransitiveIdealGraded and SearchForest as entry point.

  2. TransitiveIdeal(succ, seeds) keeps the same behavior as before and is now the same as RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='naive').

  3. TransitiveIdealGraded(succ, seeds, max_depth) keeps the same behavior as before and is now the same as RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='breadth', max_depth=max_depth).

Remarks:

A. For now the code of SearchForest is still in sage/combinat/backtrack.py. It should be moved in sage/sets/recursively_enumerated_set.pyx. See #16351.

B. TransitiveIdeal and TransitiveIealGraded are used in the code of sage/combinat, sage/categories and sage/groups at least. These should be updated to use RecursivelyEnumeratedSet for speed improvements and also to avoid issues explained in C below. See #16352.

C. Note that there were some issues with TransitiveIdeal and TransitiveIdealGraded, namely:

  • Enumeration of TransitiveIdeal is claimed to be depth first search in the top level file backtrack.py, but in fact, it is neither breadth first neither depth first. It is what I call a naive search.
  • Enumeration of TransitiveIdealGraded is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.

See my status report at SageDays57 for more info.

Depends on #14052

CC: @sagetrac-sage-combinat @hivert

Component: combinatorics

Keywords: backtrack, enumerated set, transitive closure, days57

Author: Sébastien Labbé

Branch: 3191690

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/6637

comment:1

Here are some discussions related to this ticket:

Description changed:

--- 
+++ 
@@ -4,6 +4,6 @@
      TransitiveClosure(roots, operators = ..., children = , acyclic = True, algo = "DFS", "BFS", internal_nodes = False)
 ```
 
-for all the functions in sage.combinat.backtrack.py SearchForest, TransitiveIdeal, TransitiveIdealGraded
+for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
 
 TODO: discuss the names above
comment:4

In a tentative to find good choices for name and so on, here is how I see this.

  • Let X be a set.
  • Let R be a binary relation on X, that is R is a subset of the cartesian product of X times X.
  • Denote by xRy if x is R-related to y.

Now

  • Let seeds be a subset of X.
  • Let succ be a callable python object X -> 2^X such that xRy if and only if y is in succ(x).

We are interested in the subset S of X that can be generated from the seeds using the succ recursively. More precisely in the set

S = {y : x = x1 R x2 R x3 R ... R xn = y and x in seeds}.

Moreover, we are interested in the enumeration of S itself and we consider depth-first and breadth-first as different and both usefull.

Such a relation G = (X,R) can be seen as a directed graph. I think this remark is useful as it may provide some vocabulary. Indeed the set S is the connected components of the generators in the digraph G.

I see some different cases :

1. We do not know anything more about the relation. We need to save in memory all the known objects to avoid duplicates. This is what is currently done in TransitiveIdeal (depth-first search) and (curiously) in TransitiveIdealGraded (breadth-first search) also.

2. The directed graph S is a forest with given seeds. Equivalently, one may say that S do not contain cycle (oriented or not). This is what is currently done in SearchForest.

3. The relation is graded. By graded here I mean what I thought TransitiveIdealGraded was doing until I look more carefully at the doc and the code. More seriously, by graded I mean the following : for all (x1 in seeds) and (y1 in seeds),

if (x1 R x2 R ... R xn) and (y1 R y2 R ... R ym) and (xn=ym), then (n=m).

The relation is graded if all path from the origin to an element have the same length. In this case, we only need to save in memory the current level.

4. The relation is symmetric. If the relation is symmetric, we only need to keep in memory the last two level of depth. This is what I needed and coded this week. And this is why I started to look more carefully at the code in sage/combinat/backtrack.py...

That is it for now!

comment:5

I totally agree with the analysis!

I don't know yet what would be the best name for the argument provided by the user to describe the relation. Behind the scene we are definitely modelling relation. But what the user provide is not the relation but a function that computes the (out) neighbors for this relation. If at the end of the day we choose "TransitiveClosure" as name for the main entry point, then "neighbors" would be consistent. If we go for "RecursiveSet" (or RecursiveEnumeratedSet or variant thereof) then "operators" would be consistent.

Cheers,
Nicolas

comment:6

I think relation would not be too bad for the name of the successor keyword and should be considered as well. In some sense, a binary relation is a function f: X -> 2^X and a function is a relation such that |f(x)|=1 for all x.

Possibilities for successor keyword :

  • succ : suitable for non symmetric relation, currently used in TransitiveIdeal and TransitiveIdealGraded
  • successors
  • operators
  • neighbors : suitable vocabulary for symmetric relation
  • children : suitable vocabulary for non symmetric relation, currently used in SearchForest.
  • relatives : suitable vocabulary for symmetric relation

Possibilities for generators keyword :

  • generators
  • gens
  • roots
  • seeds
comment:7

If we go for RecursiveSet (or RecursiveEnumeratedSet or variant thereof)

I like RecursiveSet. Maybe RecursiveEnumeratedSet is more related to what we do but is also longer.

Some links:

comment:8

I don't know yet what would be the best name for the argument provided by the user to describe the relation.

For the above 4 cases, I would suggest arguments like the following :

RecursiveSet(seeds, succ)
RecursiveSet(seeds, succ, structure="forest")
RecursiveSet(seeds, succ, structure="graded")
RecursiveSet(seeds, succ, structure="symmetric")
comment:9

I just added a patch. It implements the RecursiveSet_symmetric class and creates factory called RecursiveSet. For now, RecursiveSet returns either an instance of TransitiveIdeal, SearchForest or RecursiveSet_symmetric. I started an empty class RecursiveSet_graded. See examples inside the docstring of the class RecursiveSet.

It is not ready for review, but comments are welcome to help me continue this work.

Actually, my questions are :

  • How should I merge RecursiveSet with TransitiveIdeal and SearchForest?
  • Do we like this interface?

Dependencies: #14052

Commit: c2861f0

New commits:

c2861f0Some fixes to TransitiveIdeal and TransitiveIdealGraded, added RecursiveSet factory

Changed commit from c2861f0 to 5924b46

Branch pushed to git repo; I updated commit sha1. New commits:

5924b46Cleaning RecursiveSet and subclasses

Changed commit from 5924b46 to 29aebab

Branch pushed to git repo; I updated commit sha1. New commits:

29aebabnow 100% coverage, seeds instead of gens

Changed commit from 29aebab to b363c54

Branch pushed to git repo; I updated commit sha1. New commits:

b363c54Cleaning documentation and reverting some changes to TransitiveIdeals

Changed commit from b363c54 to b8108ae

Branch pushed to git repo; I updated commit sha1. New commits:

b8108aeimplemented depth search with stack and breadth search with queue

Changed commit from b8108ae to 9de2cf9

Branch pushed to git repo; I updated commit sha1. New commits:

9de2cf9moving RecursivelyEnumerableSet to sage/sets

Changed commit from 9de2cf9 to 5d63453

Branch pushed to git repo; I updated commit sha1. New commits:

5d63453doc fixes + adding Parent and category

Description changed:

--- 
+++ 
@@ -7,3 +7,9 @@
 for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
 
 TODO: discuss the names above
+
+The actual proposition is:
+
+```
+     RecursivelyEnumeratedSet(seeds, succ, structure=..., algorithm=...)
+```

Changed commit from 5d63453 to 4934fa2

Branch pushed to git repo; I updated commit sha1. New commits:

4934fa2deprecate import of TransitiveIdeal + level method for RecursivelyEnumeratedSet

Changed commit from 4934fa2 to 94a4d7a

Branch pushed to git repo; I updated commit sha1. New commits:

94a4d7alevel -> graded component, algorithm -> enumeration, more docs, deprecated SearchForest also

Changed commit from 94a4d7a to 52ce4a3

Branch pushed to git repo; I updated commit sha1. New commits:

52ce4a3succ -> successors, fixed doctests in backtrack.py
comment:25

I think I will stop now. The next thing to do would be to move code from sage/combinat/backtrack.py to sage/sets/recursively_enumerated_set.py more precisely, the SearchForest code. But since it is mostly about moving code (does not change any functionality), I suggest to do it in a another ticket and review/merge this ticket now.

Needs review!

Sébastien

Changed keywords from backtrack, enumerated set, transitive closure to backtrack, enumerated set, transitive closure, days57

comment:28

your commits remove completely combinat/all.py ....

comment:29

Replying to @fchapoton:

your commits remove completely combinat/all.py ....

I see. It is strange because I can't see which commit did that... I will investigate.

comment:30

I believe that's an error with the trac plugin (I've seen that before).

Can I make a feature request for this ticket, could we also cythonize this for speed (or at least make the new file a .pyx file)?

comment:31

Indeed, there is some gain. I did one example:

Python:

sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 5.82 s, sys: 239 ms, total: 6.06 s
Wall time: 6.07 s

Cython:

sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 4.47 s, sys: 408 ms, total: 4.88 s
Wall time: 4.89 s

Changed commit from 52ce4a3 to 4226245

Branch pushed to git repo; I updated commit sha1. New commits:

b5d53aeMerge branch 'master' into t/6637
4226245#6637: recursively_enumerated_set ffrom .py to .pyx

Description changed:

--- 
+++ 
@@ -1,15 +1,18 @@
-Implement a single entry point:
+1. Implement a single entry point for recursively enumerated sets:
 
 ```
-     TransitiveClosure(roots, operators = ..., children = , acyclic = True, algo = "DFS", "BFS", internal_nodes = False)
+     RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)
 ```
 
-for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
+2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
 
-TODO: discuss the names above
+Remarks:
 
-The actual proposition is:
+A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
 
-```
-     RecursivelyEnumeratedSet(seeds, succ, structure=..., algorithm=...)
-```
+B. Note that there are some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+
+   - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first.
+   - Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.
+
+See [my status report at SageDays57](http://www.liafa.univ-paris-diderot.fr/~labbe/blogue/2014/04/my-status-report-at-sage-days-57-recursivelyenumeratedset/) for more info.

Description changed:

--- 
+++ 
@@ -3,6 +3,8 @@
 ```
      RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)
 ```
+
+where `structure` takes values in the set `[None, 'forest', 'graded', 'symmetric']` and `enumeration` takes values in the set `[None, 'depth', 'breadth', 'naive']`.
 
 2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
 
comment:34

I keep the status of the ticket to needs work because I realized that some doctest were broken in the sage library. I am working on it.

Branch pushed to git repo; I updated commit sha1. New commits:

e174ae0Trac #6637: Adding a `__contains__` method to RecursivelyenumeratedSet
766a1b0Trac #6637: fix doctest in root_system/plot.py

Changed commit from 4226245 to 766a1b0

Description changed:

--- 
+++ 
@@ -8,13 +8,17 @@
 
 2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
 
+3. `TransitiveIdeal(succ, seeds)` keeps the same behavior as before and is now the same as `RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='naive')`.
+
+4. `TransitiveIdealGraded(succ, seeds, max_depth)` keeps the same behavior as before and is now the same as `RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='breadth', max_depth=max_depth)`.
+
 Remarks:
 
 A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
 
-B. Note that there are some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+B. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
 
-   - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first.
+   - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first. It is what I call a naive search.
    - Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.
 
 See [my status report at SageDays57](http://www.liafa.univ-paris-diderot.fr/~labbe/blogue/2014/04/my-status-report-at-sage-days-57-recursivelyenumeratedset/) for more info.

Author: Sébastien Labbé

Changed commit from 766a1b0 to d8b942b

Reviewer: Travis Scrimshaw

Changed branch from u/slabbe/6637 to public/ticket/6637

comment:37

Some more speedup by doing some more cythonization. Before:

sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 9.68 s, sys: 147 ms, total: 9.83 s
Wall time: 9.81 s

With my commit:

sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 8.02 s, sys: 103 ms, total: 8.13 s
Wall time: 8.15 s

I'm sure I haven't done the best cythonization job on this, but it works and all tests pass. If you're happy with my changes, then positive review.


New commits:

0d592ceMerge branch 'u/slabbe/6637' of trac.sagemath.org:sage into public/ticket/6637
f200525More cythonization.
72f5bf6Merge branch 'u/slabbe/6637' of trac.sagemath.org:sage into public/ticket/6637
df40815Some tweaks in the cythonization.
e68a15fFixed pickling issue.
d8b942bSome other potential cython speedups.

Description changed:

--- 
+++ 
@@ -16,7 +16,9 @@
 
 A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
 
-B. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet in a later ticket for speed improvements and also to avoid issues explained in C below.
+
+C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
 
    - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first. It is what I call a naive search.
    - Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.

Description changed:

--- 
+++ 
@@ -16,7 +16,7 @@
 
 A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
 
-B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet in a later ticket for speed improvements and also to avoid issues explained in C below.
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` in a later ticket for speed improvements and also to avoid issues explained in C below.
 
 C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
 
comment:40

I do gain one more second with your improvements. Great!

sage: sage: f = lambda a: [a-1,a+1]                                         
sage: sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: sage: it = iter(C)                                                    
sage: sage: %time L = [next(it) for _ in xrange(10^6)]                      
CPU times: user 3.49 s, sys: 246 ms, total: 3.74 s                          
Wall time: 3.79 s   

I do not like factory function in general and was happy to use for the first time the metaclass stuff. But apparently, it is not as efficient? Did you check if only removing the metaclass stuff was giving a speedup?

Changed commit from d8b942b to 3191690

Branch pushed to git repo; I updated commit sha1. New commits:

3191690Trac #6637: More deprecation doc in backtrack.py
comment:42

Travis, did you check that the doc was building fine? I am not able to, I get :

$ sage -docbuild reference/structure html
   [structure] WARNING: intersphinx inventory '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv' not fetchable due to <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv'
   Error building the documentation.

   Note: incremental documentation builds sometimes cause spurious
   error messages. To be certain that these are real errors, run
   "make doc-clean" first and try again.
   Traceback (most recent call last):
     File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 1477, in <module>
         getattr(get_builder(name), type)()
           File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 699, in _wrapper
               getattr(DocBuilder, build_type)(self, *args, **kwds)
                 File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 94, in f
                     execfile(sys.argv[0])
                       File "/Users/slabbe/Applications/sage-git/src/doc/common/custom-sphinx-build.py", line 210, in <module>
                           raise OSError(ERROR_MESSAGE)
                           OSError: [structure] WARNING: intersphinx inventory '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv' not fetchable due to <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv'

Once, I may confirm the docs does build fine. I will set this to positive review.

comment:43

make doc-clean fixed the error. Doc builds fine.

comment:44

Sebastien wants to double check the Metaclass thingy.

comment:45

As I recall, metaclasses are not supported in extension classes by Cython. The metaclass should not change the speed since it's only called/used upon object creation.

comment:46

Also FTR, I liked your usage of the metaclass (and I can't check the doc until I get my docbuilder to actually work again... :/ ).

comment:47

If you agree Travis, I will try to put the metaclass use back in. Also, maybe Florent can say a word about it. He coached me on how to implement it.

comment:48

Replying to @tscrim:

As I recall, metaclasses are not supported in extension classes by Cython. The metaclass should not change the speed since it's only called/used upon object creation.

Ok, now I see what you mean. When the class is cdef, then the __classcall_private__ do not get called instead of the __init__:

sage: f = lambda a: [a-1,a+1]
sage: RecursivelyEnumeratedSet([0], f)
A recursively enumerated set (depth first search)
sage: RecursivelyEnumeratedSet([0], f, structure='symmetric')
Traceback (most recent call last):
...
TypeError: __init__() got an unexpected keyword argument 'structure'

I also saw in the doc that: "For a Cython class (not cdef since they doesn’t allows metaclasses)"

comment:49

I pushed on my branch a commit using metaclass. In the end, I had to create a factory def outside of the class...

Changed commit from 3191690 to dd72bfc

Branch pushed to git repo; I updated commit sha1. New commits:

34c9db2Removed _generic from base class and added _factory to function.
dd72bfcMerge branch 'public/ticket/6637' of trac.sagemath.org:sage into public/ticket/6637
comment:51

Looking at your commit, I don't see any benefit in using the metaclass. However I'm not opposed to the renaming, so I've just pushed that.

comment:52

Replying to @tscrim:

Looking at your commit, I don't see any benefit in using the metaclass.

Yes exactly. I needed to play with it to understand what you meant.

However I'm not opposed to the renaming, so I've just pushed that.

Well, the renaming was just an easy way for me to check that sage -t recursively_enumerated_set.pyx was ok after I realised that __classcall_private__ was ignored by cdef class. It was not a suggestion, but it would not be the first renamed function:

sage: import_statements(sum)
from sage.misc.functional import symbolic_sum

So, we go with commit ​dd72bfc instead of ​3191690 ?

comment:53

If that's okay with you.

comment:54

Replying to @tscrim:

If that's okay with you.

The more I think about it, the less I like it. I think ​dd72bfc can be confusing for someone looking at the file for the first time. Until that person looks at the file sage/sets/all.py he will not understand how the __init__ handles the structure argument. And the key will always be hidden in some other file sage/sets/all.py. I suggest we go with your initial factory function. More precisely with commit ​3191690. Do you agree?

If so, I do not know what should we do then (git question). Should we update the commit field? Should we reset the HEAD of the branch? Should we create a new branch pointing to the commit?

comment:55

Good point. What we'll do is create a new branch which just points to the old commit (afterwards we can delete our old branches). Do you want to do this or should I?

comment:56

Let me try.

Changed commit from dd72bfc to 3191690

comment:57

The following did the trick:

git checkout 3191690 -b t/6637a
comment:58

Then let's get this in. Thanks Sébastien.

comment:59

Thanks for the review and cython improvements!

Changed branch from public/ticket/6637a to 3191690

Changed commit from 3191690 to none

Description changed:

--- 
+++ 
@@ -14,9 +14,9 @@
 
 Remarks:
 
-A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
+A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx`. See #16351.
 
-B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` in a later ticket for speed improvements and also to avoid issues explained in C below.
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` for speed improvements and also to avoid issues explained in C below. See #16352.
 
 C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely: