standardize the interface to TransitiveIdeal and friends
nthiery opened this issue · 80 comments
- Implement a single entry point for recursively enumerated sets:
RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)
where structure takes values in the set [None, 'forest', 'graded', 'symmetric'] and enumeration takes values in the set [None, 'depth', 'breadth', 'naive'].
-
Deprecate
TransitiveIdeal,TransitiveIdealGradedandSearchForestas entry point. -
TransitiveIdeal(succ, seeds)keeps the same behavior as before and is now the same asRecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='naive'). -
TransitiveIdealGraded(succ, seeds, max_depth)keeps the same behavior as before and is now the same asRecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='breadth', max_depth=max_depth).
Remarks:
A. For now the code of SearchForest is still in sage/combinat/backtrack.py. It should be moved in sage/sets/recursively_enumerated_set.pyx. See #16351.
B. TransitiveIdeal and TransitiveIealGraded are used in the code of sage/combinat, sage/categories and sage/groups at least. These should be updated to use RecursivelyEnumeratedSet for speed improvements and also to avoid issues explained in C below. See #16352.
C. Note that there were some issues with TransitiveIdeal and TransitiveIdealGraded, namely:
- Enumeration of
TransitiveIdealis claimed to be depth first search in the top level filebacktrack.py, but in fact, it is neither breadth first neither depth first. It is what I call a naive search. - Enumeration of
TransitiveIdealGradedis indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.
See my status report at SageDays57 for more info.
Depends on #14052
CC: @sagetrac-sage-combinat @hivert
Component: combinatorics
Keywords: backtrack, enumerated set, transitive closure, days57
Author: Sébastien Labbé
Branch: 3191690
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/6637
Here are some discussions related to this ticket:
-
TransitiveIdeal vs TransitiveIdealGraded on sage-combinat, Feb 2013.
-
Comment of Nicolas Thiéry at #14052, Feb 2013
-
SearchForest and post_process..., on sage-devel, Oct. 2012.
-
Sage modules and forking on sage-devel, a comment by Florent Hivert, Oct. 2012.
-
#13580, patch available on sage-combinat : trac_13580-map_reduce-fh.patch
Description changed:
---
+++
@@ -4,6 +4,6 @@
TransitiveClosure(roots, operators = ..., children = , acyclic = True, algo = "DFS", "BFS", internal_nodes = False)
```
-for all the functions in sage.combinat.backtrack.py SearchForest, TransitiveIdeal, TransitiveIdealGraded
+for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
TODO: discuss the names aboveIn a tentative to find good choices for name and so on, here is how I see this.
- Let X be a set.
- Let R be a binary relation on X, that is R is a subset of the cartesian product of X times X.
- Denote by xRy if x is R-related to y.
Now
- Let
seedsbe a subset of X. - Let
succbe a callable python objectX -> 2^Xsuch that xRy if and only if y is insucc(x).
We are interested in the subset S of X that can be generated from the seeds using the succ recursively. More precisely in the set
S = {y : x = x1 R x2 R x3 R ... R xn = y and x in seeds}.
Moreover, we are interested in the enumeration of S itself and we consider depth-first and breadth-first as different and both usefull.
Such a relation G = (X,R) can be seen as a directed graph. I think this remark is useful as it may provide some vocabulary. Indeed the set S is the connected components of the generators in the digraph G.
I see some different cases :
1. We do not know anything more about the relation. We need to save in memory all the known objects to avoid duplicates. This is what is currently done in TransitiveIdeal (depth-first search) and (curiously) in TransitiveIdealGraded (breadth-first search) also.
2. The directed graph S is a forest with given seeds. Equivalently, one may say that S do not contain cycle (oriented or not). This is what is currently done in SearchForest.
3. The relation is graded. By graded here I mean what I thought TransitiveIdealGraded was doing until I look more carefully at the doc and the code. More seriously, by graded I mean the following : for all (x1 in seeds) and (y1 in seeds),
if (x1 R x2 R ... R xn) and (y1 R y2 R ... R ym) and (xn=ym), then (n=m).
The relation is graded if all path from the origin to an element have the same length. In this case, we only need to save in memory the current level.
4. The relation is symmetric. If the relation is symmetric, we only need to keep in memory the last two level of depth. This is what I needed and coded this week. And this is why I started to look more carefully at the code in sage/combinat/backtrack.py...
That is it for now!
I totally agree with the analysis!
I don't know yet what would be the best name for the argument provided by the user to describe the relation. Behind the scene we are definitely modelling relation. But what the user provide is not the relation but a function that computes the (out) neighbors for this relation. If at the end of the day we choose "TransitiveClosure" as name for the main entry point, then "neighbors" would be consistent. If we go for "RecursiveSet" (or RecursiveEnumeratedSet or variant thereof) then "operators" would be consistent.
Cheers,
Nicolas
I think relation would not be too bad for the name of the successor keyword and should be considered as well. In some sense, a binary relation is a function f: X -> 2^X and a function is a relation such that |f(x)|=1 for all x.
Possibilities for successor keyword :
succ: suitable for non symmetric relation, currently used inTransitiveIdealandTransitiveIdealGradedsuccessorsoperatorsneighbors: suitable vocabulary for symmetric relationchildren: suitable vocabulary for non symmetric relation, currently used inSearchForest.relatives: suitable vocabulary for symmetric relation
Possibilities for generators keyword :
generatorsgensrootsseeds
If we go for RecursiveSet (or RecursiveEnumeratedSet or variant thereof)
I like RecursiveSet. Maybe RecursiveEnumeratedSet is more related to what we do but is also longer.
Some links:
I don't know yet what would be the best name for the argument provided by the user to describe the relation.
For the above 4 cases, I would suggest arguments like the following :
RecursiveSet(seeds, succ)
RecursiveSet(seeds, succ, structure="forest")
RecursiveSet(seeds, succ, structure="graded")
RecursiveSet(seeds, succ, structure="symmetric")
Attachment: trac_6637_recursive_set-sl.patch.gz
I just added a patch. It implements the RecursiveSet_symmetric class and creates factory called RecursiveSet. For now, RecursiveSet returns either an instance of TransitiveIdeal, SearchForest or RecursiveSet_symmetric. I started an empty class RecursiveSet_graded. See examples inside the docstring of the class RecursiveSet.
It is not ready for review, but comments are welcome to help me continue this work.
Actually, my questions are :
- How should I merge
RecursiveSetwithTransitiveIdealandSearchForest? - Do we like this interface?
New commits:
c2861f0 | Some fixes to TransitiveIdeal and TransitiveIdealGraded, added RecursiveSet factory |
Branch: u/slabbe/6637
Branch pushed to git repo; I updated commit sha1. New commits:
5924b46 | Cleaning RecursiveSet and subclasses |
Branch pushed to git repo; I updated commit sha1. New commits:
29aebab | now 100% coverage, seeds instead of gens |
Branch pushed to git repo; I updated commit sha1. New commits:
b363c54 | Cleaning documentation and reverting some changes to TransitiveIdeals |
Branch pushed to git repo; I updated commit sha1. New commits:
b8108ae | implemented depth search with stack and breadth search with queue |
Branch pushed to git repo; I updated commit sha1. New commits:
9de2cf9 | moving RecursivelyEnumerableSet to sage/sets |
Branch pushed to git repo; I updated commit sha1. New commits:
5d63453 | doc fixes + adding Parent and category |
Description changed:
---
+++
@@ -7,3 +7,9 @@
for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
TODO: discuss the names above
+
+The actual proposition is:
+
+```
+ RecursivelyEnumeratedSet(seeds, succ, structure=..., algorithm=...)
+```Branch pushed to git repo; I updated commit sha1. New commits:
4934fa2 | deprecate import of TransitiveIdeal + level method for RecursivelyEnumeratedSet |
Branch pushed to git repo; I updated commit sha1. New commits:
94a4d7a | level -> graded component, algorithm -> enumeration, more docs, deprecated SearchForest also |
Branch pushed to git repo; I updated commit sha1. New commits:
52ce4a3 | succ -> successors, fixed doctests in backtrack.py |
I think I will stop now. The next thing to do would be to move code from sage/combinat/backtrack.py to sage/sets/recursively_enumerated_set.py more precisely, the SearchForest code. But since it is mostly about moving code (does not change any functionality), I suggest to do it in a another ticket and review/merge this ticket now.
Needs review!
Sébastien
Changed keywords from backtrack, enumerated set, transitive closure to backtrack, enumerated set, transitive closure, days57
your commits remove completely combinat/all.py ....
Replying to @fchapoton:
your commits remove completely combinat/all.py ....
I see. It is strange because I can't see which commit did that... I will investigate.
I believe that's an error with the trac plugin (I've seen that before).
Can I make a feature request for this ticket, could we also cythonize this for speed (or at least make the new file a .pyx file)?
Indeed, there is some gain. I did one example:
Python:
sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 5.82 s, sys: 239 ms, total: 6.06 s
Wall time: 6.07 s
Cython:
sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 4.47 s, sys: 408 ms, total: 4.88 s
Wall time: 4.89 s
Description changed:
---
+++
@@ -1,15 +1,18 @@
-Implement a single entry point:
+1. Implement a single entry point for recursively enumerated sets:
```
- TransitiveClosure(roots, operators = ..., children = , acyclic = True, algo = "DFS", "BFS", internal_nodes = False)
+ RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)
```
-for all the functions in sage.combinat.backtrack.py `SearchForest`, `TransitiveIdeal`, `TransitiveIdealGraded`
+2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
-TODO: discuss the names above
+Remarks:
-The actual proposition is:
+A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
-```
- RecursivelyEnumeratedSet(seeds, succ, structure=..., algorithm=...)
-```
+B. Note that there are some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+
+ - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first.
+ - Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.
+
+See [my status report at SageDays57](http://www.liafa.univ-paris-diderot.fr/~labbe/blogue/2014/04/my-status-report-at-sage-days-57-recursivelyenumeratedset/) for more info.Description changed:
---
+++
@@ -3,6 +3,8 @@
```
RecursivelyEnumeratedSet(seeds, successors, structure=..., enumeration=...)
```
+
+where `structure` takes values in the set `[None, 'forest', 'graded', 'symmetric']` and `enumeration` takes values in the set `[None, 'depth', 'breadth', 'naive']`.
2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
I keep the status of the ticket to needs work because I realized that some doctest were broken in the sage library. I am working on it.
Description changed:
---
+++
@@ -8,13 +8,17 @@
2. Deprecate `TransitiveIdeal`, `TransitiveIdealGraded` and `SearchForest` as entry point.
+3. `TransitiveIdeal(succ, seeds)` keeps the same behavior as before and is now the same as `RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='naive')`.
+
+4. `TransitiveIdealGraded(succ, seeds, max_depth)` keeps the same behavior as before and is now the same as `RecursivelyEnumeratedSet(seeds, succ, structure=None, enumeration='breadth', max_depth=max_depth)`.
+
Remarks:
A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
-B. Note that there are some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+B. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
- - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first.
+ - Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first. It is what I call a naive search.
- Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.
See [my status report at SageDays57](http://www.liafa.univ-paris-diderot.fr/~labbe/blogue/2014/04/my-status-report-at-sage-days-57-recursivelyenumeratedset/) for more info.Author: Sébastien Labbé
Reviewer: Travis Scrimshaw
Changed branch from u/slabbe/6637 to public/ticket/6637
Some more speedup by doing some more cythonization. Before:
sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 9.68 s, sys: 147 ms, total: 9.83 s
Wall time: 9.81 s
With my commit:
sage: f = lambda a: [a-1,a+1]
sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: it = iter(C)
sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 8.02 s, sys: 103 ms, total: 8.13 s
Wall time: 8.15 s
I'm sure I haven't done the best cythonization job on this, but it works and all tests pass. If you're happy with my changes, then positive review.
New commits:
0d592ce | Merge branch 'u/slabbe/6637' of trac.sagemath.org:sage into public/ticket/6637 |
f200525 | More cythonization. |
72f5bf6 | Merge branch 'u/slabbe/6637' of trac.sagemath.org:sage into public/ticket/6637 |
df40815 | Some tweaks in the cythonization. |
e68a15f | Fixed pickling issue. |
d8b942b | Some other potential cython speedups. |
Description changed:
---
+++
@@ -16,7 +16,9 @@
A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
-B. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet in a later ticket for speed improvements and also to avoid issues explained in C below.
+
+C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
- Enumeration of `TransitiveIdeal` is claimed to be depth first search in the top level file `backtrack.py`, but in fact, it is neither breadth first neither depth first. It is what I call a naive search.
- Enumeration of `TransitiveIdealGraded` is indeed breadth first as claimed but it does not make use of the graded hypothesis at all because it remembers every generated elements.Description changed:
---
+++
@@ -16,7 +16,7 @@
A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
-B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet in a later ticket for speed improvements and also to avoid issues explained in C below.
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` in a later ticket for speed improvements and also to avoid issues explained in C below.
C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely:
I do gain one more second with your improvements. Great!
sage: sage: f = lambda a: [a-1,a+1]
sage: sage: C = RecursivelyEnumeratedSet([10, 15], f, structure='symmetric')
sage: sage: it = iter(C)
sage: sage: %time L = [next(it) for _ in xrange(10^6)]
CPU times: user 3.49 s, sys: 246 ms, total: 3.74 s
Wall time: 3.79 s
I do not like factory function in general and was happy to use for the first time the metaclass stuff. But apparently, it is not as efficient? Did you check if only removing the metaclass stuff was giving a speedup?
Branch pushed to git repo; I updated commit sha1. New commits:
3191690 | Trac #6637: More deprecation doc in backtrack.py |
Travis, did you check that the doc was building fine? I am not able to, I get :
$ sage -docbuild reference/structure html
[structure] WARNING: intersphinx inventory '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv' not fetchable due to <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv'
Error building the documentation.
Note: incremental documentation builds sometimes cause spurious
error messages. To be certain that these are real errors, run
"make doc-clean" first and try again.
Traceback (most recent call last):
File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 1477, in <module>
getattr(get_builder(name), type)()
File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 699, in _wrapper
getattr(DocBuilder, build_type)(self, *args, **kwds)
File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 94, in f
execfile(sys.argv[0])
File "/Users/slabbe/Applications/sage-git/src/doc/common/custom-sphinx-build.py", line 210, in <module>
raise OSError(ERROR_MESSAGE)
OSError: [structure] WARNING: intersphinx inventory '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv' not fetchable due to <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/Users/slabbe/Applications/sage-git/src/doc/output/html/en/reference/quivers/objects.inv'
Once, I may confirm the docs does build fine. I will set this to positive review.
make doc-clean fixed the error. Doc builds fine.
Sebastien wants to double check the Metaclass thingy.
As I recall, metaclasses are not supported in extension classes by Cython. The metaclass should not change the speed since it's only called/used upon object creation.
Also FTR, I liked your usage of the metaclass (and I can't check the doc until I get my docbuilder to actually work again... :/ ).
If you agree Travis, I will try to put the metaclass use back in. Also, maybe Florent can say a word about it. He coached me on how to implement it.
Replying to @tscrim:
As I recall, metaclasses are not supported in extension classes by Cython. The metaclass should not change the speed since it's only called/used upon object creation.
Ok, now I see what you mean. When the class is cdef, then the __classcall_private__ do not get called instead of the __init__:
sage: f = lambda a: [a-1,a+1]
sage: RecursivelyEnumeratedSet([0], f)
A recursively enumerated set (depth first search)
sage: RecursivelyEnumeratedSet([0], f, structure='symmetric')
Traceback (most recent call last):
...
TypeError: __init__() got an unexpected keyword argument 'structure'
I also saw in the doc that: "For a Cython class (not cdef since they doesn’t allows metaclasses)"
Looking at your commit, I don't see any benefit in using the metaclass. However I'm not opposed to the renaming, so I've just pushed that.
Replying to @tscrim:
Looking at your commit, I don't see any benefit in using the metaclass.
Yes exactly. I needed to play with it to understand what you meant.
However I'm not opposed to the renaming, so I've just pushed that.
Well, the renaming was just an easy way for me to check that sage -t recursively_enumerated_set.pyx was ok after I realised that __classcall_private__ was ignored by cdef class. It was not a suggestion, but it would not be the first renamed function:
sage: import_statements(sum)
from sage.misc.functional import symbolic_sum
So, we go with commit dd72bfc instead of 3191690 ?
If that's okay with you.
Replying to @tscrim:
If that's okay with you.
The more I think about it, the less I like it. I think dd72bfc can be confusing for someone looking at the file for the first time. Until that person looks at the file sage/sets/all.py he will not understand how the __init__ handles the structure argument. And the key will always be hidden in some other file sage/sets/all.py. I suggest we go with your initial factory function. More precisely with commit 3191690. Do you agree?
If so, I do not know what should we do then (git question). Should we update the commit field? Should we reset the HEAD of the branch? Should we create a new branch pointing to the commit?
Good point. What we'll do is create a new branch which just points to the old commit (afterwards we can delete our old branches). Do you want to do this or should I?
Let me try.
Changed branch from public/ticket/6637 to public/ticket/6637a
The following did the trick:
git checkout 3191690 -b t/6637a
Then let's get this in. Thanks Sébastien.
Thanks for the review and cython improvements!
Changed branch from public/ticket/6637a to 3191690
Description changed:
---
+++
@@ -14,9 +14,9 @@
Remarks:
-A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx` in a later ticket.
+A. For now the code of `SearchForest` is still in `sage/combinat/backtrack.py`. It should be moved in `sage/sets/recursively_enumerated_set.pyx`. See #16351.
-B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` in a later ticket for speed improvements and also to avoid issues explained in C below.
+B. `TransitiveIdeal` and `TransitiveIealGraded` are used in the code of `sage/combinat`, `sage/categories` and `sage/groups` at least. These should be updated to use `RecursivelyEnumeratedSet` for speed improvements and also to avoid issues explained in C below. See #16352.
C. Note that there were some issues with `TransitiveIdeal` and `TransitiveIdealGraded`, namely: