Parallel map reduce on SearchForest
hivert opened this issue · 145 comments
Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.
CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe
Component: combinatorics
Keywords: map-reduce, days57, days77
Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen
Branch: 134c1fa
Reviewer: Sébastien Labbé, Jean-Baptiste Priez
Issue created by migration from https://trac.sagemath.org/ticket/13580
Salut Florent et Nathann,
I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...
- Will SearchForest survive or not?
- Does it replace SearchForest?
- Does it improve SearchForest?
- Does it use SearchForest?
- Or is it used by SearchForest?
Also, Florent wrote on sage-devel in October 2012 that
I'm also in the process of finalizing a patch which do parallel and even
distributed map-reduce on recursively enumerated sets (currently badly named
SearchForest, I'll change the name in my patch, ticket #13580, patch on
Sage-Combinat queue [1]).
- I do not see in the cited patch that the name of SearchForest is changed.
- What would be a better name for SearchForest ?
- What patch is the more recent? the "old" one or the "experimental" one?
trac_13580-map_reduce-fh.patch #+experimental
map_reduce_improved_loop-fh.patch #+experimental
map_reduce_condition-fh.patch #+experimental
trac_13580-map_reduce-old-fh.patch
Branch: u/hivert/13580/map_reduce
New commits:
693c672 | Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests. |
Changed keywords from map-reduce to map-reduce, days57
Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce
Branch pushed to git repo; I updated commit sha1. New commits:
addb17a | 13580: Trivial rest fix |
Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce
Branch pushed to git repo; I updated commit sha1. New commits:
92e6e68 | Merge branch 't/13580/map_reduce' into 13580 |
Branch pushed to git repo; I updated commit sha1. New commits:
734c748 | #13580 Fixed test after merging #6637 |
I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"
Branch pushed to git repo; I updated commit sha1. New commits:
f234f60 | #13580 : improved documentation |
Branch pushed to git repo; I updated commit sha1. New commits:
168ceae | Added architecture picture for map_reduce |
Branch pushed to git repo; I updated commit sha1. New commits:
493bb52 | Done the doc of Map/Reduce |
Branch pushed to git repo; I updated commit sha1. New commits:
2534f12 | Moved map/reduce to sage/parallel |
Description changed:
---
+++
@@ -1,2 +1,2 @@
-Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`.
+Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Replying to @seblabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.
Branch pushed to git repo; I updated commit sha1. New commits:
e5b7477 | Merge 6.10.rc1 + fixed conflict |
Branch pushed to git repo; I updated commit sha1. New commits:
82fd1e4 | Fixed links according to deprecation/rebase |
Replying to @hivert:
Replying to @seblabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.
Done ! I was actually based on 6.9.
Branch pushed to git repo; I updated commit sha1. New commits:
68b6530 | Removed TODO in doc |
Branch pushed to git repo; I updated commit sha1. New commits:
b93ba7d | Fixed a link + doctest |
Branch pushed to git repo; I updated commit sha1. New commits:
5c7720d | Fixed doctest continuations and raise statements |
Replying to @sagetrac-git:
| 5c7720d | Fixed doctest continuations and raise statements |
Patchbot was complaining about old style multiline doctests and exception raises. Fixed and reuped. Should be Ok now.
Branch pushed to git repo; I updated commit sha1. New commits:
cb9c011 | Fixed another multiline doctest |
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
First line of /src/sage/parallel/map_reduce.py : Paralell
Later in the same file I think: parameters tu the
Branch pushed to git repo; I updated commit sha1. New commits:
14b086b | Fixed Sebastien's Typos |
Replying to @seblabbe:
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
- It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
- Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.
I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counter-arguments to mine? I give an example below:
diff --git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py
index 9dee057..ac1243a 100644
--- a/src/sage/combinat/backtrack.py
+++ b/src/sage/combinat/backtrack.py
@@ -695,17 +695,19 @@ class SearchForest(Parent):
reduce_function = None,
reduce_init = None):
r"""
- Apply en Map Reduce algorithm on ``self``
+ Apply a Map Reduce algorithm on ``self``
EXAMPLES::
- sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)],
- ....: lambda (list, sum, last):
- ....: [(list + [i], sum + i, i) for i in range(1,last)],
- ....: structure='forest', enumeration='depth')
+ sage: seeds = [([i],i, i) for i in range(1,10)]
+ sage: succ = lambda (list, sum, last):
+ ....: [(list + [i], sum + i, i) for i in range(1,last)]
+ sage: F = RecursivelyEnumeratedSet(seeds, succ,
+ ....: structure='forest', enumeration='depth')
sage: y = var('y')
- sage: F.map_reduce(
- ....: lambda (li, sum, _): y**sum, lambda x,y: x + y, 0 )
+ sage: map_function = lambda (li, sum, _): y**sum
+ sage: reduce_function = lambda x,y: x + y
+ sage: F.map_reduce(map_function, reduce_function, 0)
y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35
.. SEEALSO:: :mod:`sage.parallel.map_reduce`I noticed another typo on the same line as the one parameters tu the : On -> One
Replying to @seblabbe:
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
- It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
I'm tempted to answer that I don't have this problem with emacs ;-). Of course, for the people using a two letters editor...
- Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
Replying to @hivert:
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
Actually, I'm not that much convinced. It's not clear for me that
sage: map_function = lambda x: 1
sage: reduce_function = lambda x,y: x+y
sage: reduce_init = 0
sage: S.map_reduce(map_function, reduce_function, reduce_init)
131071
is much better than
sage: S.map_reduce(
....: map_function = lambda x: 1,
....: reduce_function = lambda x,y: x+y,
....: reduce_init = 0 )
In this second version which is the style I adopted in map_reduce.py, we don't repeat the name twice, which is better in my opinion.
A third possibility is
sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)
It is the one used through the whole backtrack.py files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.
What do you think ?
I'm not sure to understand the patchbot complaint ? Is it because the map_reduce is not imported as Sage's startup ?
I just uploaded a doc improvement: the map_reduce function was missing the description of the input.
Branch pushed to git repo; I updated commit sha1. New commits:
e3c4c71 | Doc improvements |
Branch pushed to git repo; I updated commit sha1. New commits:
31c4735 | Removed unneded import in pxd |
Replying to @hivert:
What do you think ?
I think that some doctest are more important than other, especially the first doctests that people might be expected to see when using the map reduce code. For example the map_reduce method in the recursively enumerated set file is one possible entry point where clarity of doctests matters more. Therefore, I like the way you change the doctest. Definitively, the top of the module is an important entry point. For other sub methods of the class map reduce, I care less...
Replying to @hivert:
I'm not sure to understand the patchbot complaint ? Is it because the
map_reduceis not imported as Sage's startup ?
I don't see why neither...
Event, Condition, time and os are imported in map_reduce file but are not used.
Is there a reason why you use res=[""] and res[0] in print_communication_statistics instead of res="" and res?
Replying to @seblabbe:
Is there a reason why you use
res=[""]andres[0]inprint_communication_statisticsinstead ofres=""andres?
Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/python-nested-function-scopes).
Replying to @seblabbe:
Event,Condition,timeandosare imported in map_reduce file but are not used.
Fixed in my last push.
Branch pushed to git repo; I updated commit sha1. New commits:
cca7881 | Removed unneeded import |
Replying to @seblabbe:
I don't see why neither...
According to Vincent on Sage devel it is probably a bug of the patchbot (see https://groups.google.com/forum/#!topic/sage-devel/xlpLwktA5Hk).
Replying to @hivert:
Replying to @seblabbe:
Is there a reason why you use
res=[""]andres[0]inprint_communication_statisticsinstead ofres=""andres?Yes ! This is the classical trick to have a local variable shared with the local function (see e.g: http://stackoverflow.com/questions/2609518/python-nested-function-scopes).
Could you add this information inside the code of that method?
map_reduce method of SearchForest should document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the value None for arguments map_function and reduce_function is replaced by what?
On can use the three following parameters: -> One can use the three following parameters:
I get the following doctests error on top on sage-6.10 on my machine:
Git branch: 13580
Using --optional=gcc,mpir,python2,sage
Doctesting 1 file.
sage -t --warn-long 85.4 map_reduce.py
**********************************************************************
File "map_reduce.py", line 960, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
Failed example:
S._active_tasks
Expected:
<Semaphore(value=2)>
Got:
<Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 964, in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
Failed example:
S._active_tasks
Expected:
<Semaphore(value=3)>
Got:
<Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 985, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
Failed example:
S._active_tasks
Expected:
<Semaphore(value=2)>
Got:
<Semaphore(value=unknown)>
**********************************************************************
File "map_reduce.py", line 989, in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
Failed example:
S._active_tasks
Expected:
<Semaphore(value=1)>
Got:
<Semaphore(value=unknown)>
**********************************************************************
2 items had failures:
2 of 9 in sage.parallel.map_reduce.RESetMapReduce._signal_task_done
2 of 7 in sage.parallel.map_reduce.RESetMapReduce._signal_task_start
[249 tests, 4 failures, 43.04 s]
----------------------------------------------------------------------
sage -t --warn-long 85.4 map_reduce.py # 4 doctests failed
----------------------------------------------------------------------
It seems ok on patchbot reporting seemingly unrelated errors :
----------------------------------------------------------------------
sage -t --long src/sage/interfaces/expect.py # Timed out
sage -t --long src/sage/interfaces/gap.py # 10 doctests failed
----------------------------------------------------------------------
I divided the review in a stack of tasks but since I am unable to parallelize my working time, I did the review serially with many interruptions during the previous days:)
I finished to look at what I wanted to look at. The documentation compiles good and helps to understand what is happening. I was able to test the parallel computation on at least one extensive example. To me it will be a positive review after my previous four comments are answered.
Replying to @seblabbe:
I get the following doctests error on top on sage-6.10 on my machine:
Expected: <Semaphore(value=1)> Got: <Semaphore(value=unknown)>
Is this happening on MacOS ? I know that Semaphore are implemented differently on this system. See in particular the note after class multiprocessing.BoundedSemaphore, where it says:
on Unix platforms like Mac OS X where sem_getvalue() is not implemented.
If so and if the rests behave correctly, I'm tempted to replace the doctests with
<Semaphore(value=...)>
What do you think ?
Replying to @seblabbe:
map_reducemethod ofSearchForestshould document what happen when this method is called with no arguments, i.e. returns the cardinality. More precisely, it should say that the valueNonefor argumentsmap_functionandreduce_functionis replaced by what?
I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.
By the way thanks for your careful review !!!
Florent
Branch pushed to git repo; I updated commit sha1. New commits:
1a8b78e | - MacOSX compatible doctests for semaphore. |
I uploaded a patch which should address all your concerns:
- tests on MacOSX
- typos
- default values documentation
Introduced typo in the previous commit: the function costant to 1. -> constant
Replying to @hivert:
Replying to @seblabbe:
I get the following doctests error on top on sage-6.10 on my machine:
Expected: <Semaphore(value=1)> Got: <Semaphore(value=unknown)>Is this happening on MacOS ?
Yes
If so and if the rests behave correctly, I'm tempted to replace the doctests with
<Semaphore(value=...)>What do you think ?
I agree. All tests passed now...
Replying to @hivert:
I didn't document this one because I'm was not sure It was a sensible default. If you think it is then let's go for it.
I think it is a good default. Also, sometimes, you may want to set only some of the arguments and this information allows it. Finally, it's good to see that cardinality is a special case of this method.
Reviewer: Sébastien Labbé
Just to be clear: I am waiting for the costant -> constant fix to change the status to positive review.
Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce
Typo fixed, hence changing the status to positive review on Sebastien's behalf.
New commits:
98fd9e1 | 13580: typo fix |
Hello and merry christmas every one,
I have some question on this ticket.
Firstly, I don't understand the documentation from the line 571:
Decription of the map/reduce operation:
- ``map_function=f`` -- (default to ``None``)
- ``reduce_function=red`` -- (default to ``None``)
- ``reduce_init=init`` -- (default to ``None``)
What means f, red and init? My opinion is that f, redand init are irrelevant but an example should be interesting at this point:
(copy-paste of one in the module documentation)
sage: from sage.parallel.map_reduce import RESetMapReduce
sage: S = RESetMapReduce(
....: roots = [[]],
....: children = lambda l: [l+[0], l+[1]] if len(l) <= 15 else [],
....: map_function = lambda x : 1,
....: reduce_function = lambda x,y: x+y,
....: reduce_init = 0 )
sage: S.run()
131071
Ok, there is the seealso which says go to see the documentation of the module for examples, but one example there is interesting, no?
Then it seems there is some useless import (like line 1345 from multiprocessing import current_process) (I try to compile sage soon and I can remove all useless import if you want).
Finally, I'm not sure to understand the use of AbortError. It is used to abort the computation and to say to the workers and the thiefs every think is done?
Jean-Baptiste
Replying to @vbraun:
I can not reproduce this problem on my Mac OS X Yosemite 10.10.2:
$ sage -tp --long src/sage/parallel/map_reduce.py
Running doctests with ID 2015-12-28-10-44-31-65e477d9.
Git branch: 13580
Using --optional=gcc,mpir,python2,sage
Doctesting 1 file using 2 threads.
sage -t --long --warn-long 85.0 src/sage/parallel/map_reduce.py
[252 tests, 40.89 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Hello,
I can reproduce test failures on El Capitan (os x):
I run 3 tests and I obtained 3 distincts failures. My problem is that if I try to execute those lines which fails in sage terminal (several times) then it does not fail!
Is it possible that the test environment is not robust specially on mac os?
I read the code and I don't know how to understand this:
res = post_process(node)
if res is not None:
self._res = reduc(self._res, mapp(res))
in the worker code of walk_branch_locally method.
I think this is useless, no? Or if not then the documentation of post_process function should be improve to specify that if post_process returns None then there is a specific behavior.
Replying to @sagetrac-elixyre:
res = post_process(node) if res is not None: self._res = reduc(self._res, mapp(res))I think this is useless, no? Or if not then the documentation of
post_processfunction should be improve to specify that ifpost_processreturnsNonethen there is a specific behavior.
Ok, this behavior is specify in the documentation of RESetMapReduce but this is not specified in paragraph (line 135) associated of this module.
Furthermore the lines:
for child in newnodes:
self._todo.append(child)
of walk_branch_locally should be replaced by
self._todo_.extend(newnodes)
(I just notice that I read and if you are agree with my remark then I can replace that)
Go back to None activities, if I assume some users want to use nodes which could be None then my first idea is to use the post_process function to avoid problem but... if the thief doesn't receive then he send None and he says there is no job to do.
I propose two solutions:
- first one, to specify that
Noneis not a node never! (this is assumed but this must be explicited in the documentation), - second one (the object way), to define an object
NoTask.
Opinion?
Replying to @sagetrac-elixyre:
Go back to
Noneactivities, if I assume some users want to use nodes which could beNonethen my first idea is to use thepost_processfunction to avoid problem but... if the thief doesn't receive then he sendNoneand he says there is no job to do.I propose two solutions:
- first one, to specify that
Noneis not a node never! (this is assumed but this must be explicited in the documentation),
I'm clearly in Favor of this solution. We assume that None doesn't blong to the set described by the underlying RESet. Note that this is something which concerns more the underlying RESet than the map-reduce computation.
Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce
Changed author from Florent Hivert, Nathann Cohen to Florent Hivert, Jean-Baptiste Priez, Nathann Cohen
Replying to @vbraun:
We finally got the problem with Jean-Baptiste. It occurs that semaphore are broken on MacOSX (or at least are not fully POSIX compliant). In particular, on standard unixes, when two processes are trying to acquire a semaphore whose value is more than two, they always both succeeded. On MacOS, one of them may fail. As a consequence, I'm writing a different code form MacOS relying on a Lock and a shared integer. It may be slower on system where semaphore are implemented in a lockless way.
Is there a standard Sage/Python way to check if we are on MacOS ?
New commits:
44566e6 | ticket search forest map reduce with no problem on mac os x |