DEAP/deap

Remove SCOOP from DEAP

fmder opened this issue Β· 18 comments

fmder commented
Remove SCOOP from DEAP

I'm wondering why SCOOP is going to be removed?

fmder commented

Scoop is no longer in development, and to my knowledge, no longer supported.

Still works, though! :)

I think the parallelism ecosystem, especially in Python, has changed tremendously since SCOOP's inception. There are alternatives with more features and development resources than a one-man side-project. The only niche aspect that has not been replicated elsewhere is SCOOP's parsers for typical academic clusters management systems, such as SLURM and PBS-derivatives, but I think they can be reused easily.

scoop is great but it is no longer in development, please suggest another alternative which is as great.

In the past, SCOOP has often been compared to Pathos, which is still in relatively active development and is still a viable alternative ( https://github.com/uqfoundation/pathos ).
I believe the futures interface of Dask (https://dask.org/) is one of the most interesting alternatives currently. It even has native support for HPC clusters ( https://docs.dask.org/en/latest/setup/hpc.html ).
Other solutions include Celery (also broker-based, http://www.celeryproject.org/), Joblib (https://joblib.readthedocs.io/en/latest/), Pyina (https://github.com/uqfoundation/pyina), Dispy (https://github.com/pgiri/dispy) and Ray (https://github.com/ray-project/ray) are some of the examples to come to mind, most have much more developed memory sharing semantics, provide diagnostic tools and are using simpler or more suited communication alternatives than ZeroMQ under the hood. There are also much better serialization alternatives now than the one used by SCOOP (e.g. Dill).

As I said, SCOOP still works for many people, if you want to use it. However, I won't have the time anytime soon to update it, and I don't feel comfortable accepting most pull requests are they provide either features that are untested or for specific cluster systems that I have no access to, so I would not be able to maintain despite my best wishes.

is it possible that anyone can provide an example on Dask + DEAP? SCOOP doesn't really work for me smoothly right.

There are more or less sophisticated ways to interact with Dask (or just its cluster library: distributed), but if you just want to get started with the default cluster (close to SCOOP), you can do:

import dask.bag as db

def dask_map(func, iterable):
  bag = db.from_sequence(iterable).map(func)
  return bag.compute()

#...
# In your deap setup
toolbox.register('map', dask_map)

@cyrilpic it works beautifully, thanks

Dadle commented

+1 for Celery support.

Any example implemntation using Docker orchestration would also be much appreciated!

I was reading about Ray(thank you @soravux for that suggestion above) and its solutions for shared memory objects to prevent the need to duplicate global scope by moving data to the node level(so cool) without having to reinvent any wheels. I wanted to give it a quick try, however if using a version 0.9 or higher things look even more familiar:
If on newer version of Ray I should have used the ActorPool + Map as mentioned in ray docs to use a fixed size pool similar to mp/scoop to batch out the work vs firing the workers off at once.

Quick example of Ray 0.7.6 + DEAP to use shared-memory data and all processes on a machine for eval(had to move the GP' compile outside the remote functions due to lack of global scope in eval):
https://gist.github.com/DMTSource/b80f1afb854f688dcccc4d60b18a721f

About the share mem objects and ray in general:
https://ray.readthedocs.io/en/latest/walkthrough.html#objects-in-ray

fmder commented

this works for me:

from dask.distributed import Client
client = Client(processes=False)

def dask_map(*args, **kwargs):
    return client.gather(client.map(*args, **kwargs))

This example should behave exactly like scoop.futures.map:

  • additional parameters are passed to the map
  • dynamic classes from deap.creator exist on the worker.

I would love to do processes=True, but then I would have to initialize the classes in deap.creator somehow when setting up the worker. I assume this would be easy enough, but I don't know enough about dask to do it. (Also the above example is good enough for me for now)

fmder commented

Thanks I'll look into that for multiprocessing looks promising.

Just tested a couple of options listed above. Ray's one is elegant and powerful.. the plugin itself also has some nice-to-have tools to be integrated with deap such beside parallelization itself.

It has a live dashboard and logging which is fantastic!

+1 to Ray.

hey - I'm one of the Ray maintainers. Glad to see Ray mentioned here - I'm more than happy to help out for the Ray integration!

Today I was able to start an integration for Ray, and made some headway in trying to replace the way we use Scoop in Deap. It comes with a GA and GP example based on the existing Onemax and Symbreg examples.

Here is the repo, ideally to add to Deap when appropriate, for the new drop in use of Ray for registering a 'map' function in order to replace Scoop in as close a way as possible as what we are already used to. Some changes had to be made elsewhere vs the original examples to preserve scope for the remote workers.

Please check it out! Now DeltaPenalty and Ephemeral Constants work just fine at scale!!!

https://github.com/DMTSource/deap_ray

That's awesome @DMTSource ! I wonder if we can get in touch with the DEAP maintainers to see what they think..

Looking forwards to the integration with Ray

Was going to open an issue asking about it, so happy to see this already in progress ❀️

🎊 πŸŽ‰