AlgTUDelft/ExpensiveOptimBenchmark

Problem creating cfdbench docker image

dietmarwo opened this issue · 4 comments

There is a problem executing

self.evalCommand = ["docker", "run", "--rm", "cfdbench", "./dockerCall.sh", self.name]

in DockerCFDBenchmarkProblem.

Tried to execute

sudo docker build -t cfdbench . -f ./CFD.Dockerfile

to create the cfdbench image but

add-apt-repository -y 'ppa:deadsnakes/ppa' &&\

failed. Would be better to upload an image to https://hub.docker.com/ instead of relying on an instable build process.

self.evalCommand = ["docker", "run", "--rm", "frehbach/cfd-test-problem-suite", "./dockerCall.sh", self.name]

worked, but I couldn't execute PitzDaily, is this supposed to work with the unpatched image?

Are there results for the benchmarks somewhere maintained? I am specially interested in CFD, Hospital, hpo, TSP and
windwake results for different methods dependent on the invested evaluation/time budget. Maybe there are publications
after the competitions last year, then it would be nice if they could be linked in the README.

The Hospital benchmark seems to have a huge random variance.
Is this intentional? How can we compare methods this way?

Hi,
I will leave the docker issues to someone else, as I am less familiar with the docker build process. As for the benchmark results, we have them available at the following links:
-EXPObench paper
-EXPObench raw data
-Hospital problem
-TSP

The Hospital benchmark can have quite a large variance, like around 10. But in our experience, the variance is much lower in good parts of the search space, around 3-4. This is one of its challenges. We still need to add documentation for this benchmark.

8uurg commented

Support for deadsnakes seems to have ended last month for Ubuntu Xenial: deadsnakes/issues#195
I presume the error is because of that.
Since the container is based on another container, the version of ubuntu cannot be upgraded, so that is a bit of a pickle.

Update: I am building python from source instead, which seems to work fine. Once I've tested this I'll push the changes to the repo.
Update 2: Pushed change, Docker now builds from python's source instead
Update 3: If need be, here is the image I built on docker hub: https://hub.docker.com/r/8uurg/cfdbench

Thks a lot, for me the issue regarding the docker image is solved.

This collection of real world optmization problems - and optimizers -
is very valuable, the research community seems to focus more and more
in this direction.

What do you think about creating a https://gitter.im/ for this repo?
As a discussion forum this works better than creating github issues.
Or is there somewhere else a discussion forum related to real world
optimzation? See for instance https://gitter.im/pagmo2/Lobby
which is about ESAs pagmo optimization library (no surrogate models used).

Regarding the hospital simulation I found two more references:

https://dl.acm.org/doi/abs/10.1145/3449726.3463287

https://www.researchgate.net/publication/353114806_Surrogate-based_optimisation_for_a_hospital_simulation_scenario_using_pairwise_classifiers

The latter one is quite interesting, since it describes
how a standard algorithm like DE can be improved using machine
learning classifiers as surrogate model. Unfortunately neither the
details nor the concrete implementation for the decision tree used are given.

For me parallel function evaluation
(like it is described here https://facebookresearch.github.io/nevergrad/optimization.html)
is essential for expensive real world problems.
Dockerization means that you could even use a cloud cluster to speed up things.
This is the reason a surrogate model which can be used in connection with parallel differential
evolution is so promising.

How do you assess the value of parallel function evaluation for real world problems?

Noise 3-4 for the good parts of the search space for the Hospital benchmark still is a lot.
In TSP this problem is mitigated by using the maximum
robust_total_route_length of several runs, shouldn't something similar be done here?
The difference in the results between "good" and "not so good" approaches is around 2,
less than the noise even for the good parts of the solution space.

Good to hear that the docker issue is resolved, thanks @8uurg!

And good to hear that you find this repository valuable. It would be good to have a space to discuss these challenges, but this particular repository is not yet big enough to warrant something like that I think. Still, if you want to set up something about real life optimization (especially expensive optimization) I'd be happy to join efforts, you can contact me via l.bliek@tue.nl.

As for parallelization, we actually went the other direction and made sure only one core is used for all the algorithms. This was done to make a fair comparison possible. I agree that for really solving expensive real world problems, parallelization is necessary, but we don't have something like that implemented in this benchmark library yet. The same holds for comparison with DE or other population-based methods.

How to deal with the large variance of the noise is also still an open problem. Right now we just handled that by repeating all experiments multiple times, to know which method works better on average. One could indeed also do that at each function evaluation separately, as is done in TSP. I believe that there will be another challenge related to the Hospital benchmark this year at GECCO, maybe better approaches or a change in problem setting will occur there.