RobotLocomotion/drake

Provide ability to call SolveInParallel with program generators.

Opened this issue · 8 comments

Since the introduction of SolveInParallel in #21957 , there have been several PRs which try to match this pattern (e.g. #21705, #21351, #22222). In these cases, it is memory intensive to generate all the programs at once in memory, however it is easy to say decide what the ith program is. When solving MathematicalProgram in parallel there are a couple of subtleties that are easy to forget/miss when implementing.
1.) Setting CommonSolverOptions.kMaxThread = 1
2.) Skipping over programs which are not thread safe and solving these in a second pass in serial.
3.) Managing the number of solvers which are created by the threads.

Managing these complexities was a major motivator for implementing SolveInParallel and why it is generally a bad idea to implement the parallel for loop manually each time.

To avoid this potential for implementation errors, I propose adding a new overload

std::vector<MathematicalProgramResult> SolveInParallel(
    const std::function<std::unique_ptr<MathematicalProgram>(int64_t, int64_t)>&
        prog_generator,
    const std::function<std::optional<Eigen::VectorXd>(int64_t, int64_t)>&
        initial_guesses_generator,
    const std::function<std::optional<SolverOptions>(int64_t, int64_t)>&
        solver_options_generator,
    const std::function<std::optional<SolverId>(int64_t, int64_t)>&
        solver_ids_generator,
    const int64_t range_start, const int64_t range_end, Parallelism parallelism,
    bool dynamic_schedule)

which all parallel for loops of MathematicalProgram should route through.

I am willing to make this function internal, but I think it could be beneficial to have this in the public API.

See draft PR #22225

+1 to implementing this feature -- thanks for starting to work on it!

I think a concrete example in the code would be helpful. Let's take IsBoundedParallel implemented in #22084 and located here.

What the implementation in #22225 achieves is the ability to generate the programs via a lambda and run the par-for loop using the new method. However, what it fails to achieve is the efficient reuse of the MathematicalProgram since there is no concept of "tear down" in the current mathematical program.

Would love some feedback on how we could achieve this in practice.

I think I have narrowed down the proposal to the following signature:

/**
 * Provides the same functionality as SolveInParallel, but the programs, initial
 * guesses, solver options and solver ids are created using a generator.
 *
 * The input to the generator is the current thread number and an index i and
 * the output is the ith program, guess, option and solver id respectively. The
 * index i is iterated from range_start to range_end.
 *
 * The output of prog_generator cannot be a nullptr. The user is responsible for
 * ensuring that the pointer is not dangling.
 *
 * The output of the initial_guesses_generator, solver_options_generator and
 * solver_ids can be std::nullopt
 *
 * After the ith program is solved, the prog_teardown function is called with
 * the ith program and its result, the current thread number, and the index i to
 * potentially clean up after the solve or perform some other callback
 *
 * @return A vector of size range_end with range_start to range_end populated
 * with the results of solving prog_generator(*, range_start-range_end)
 */
template <typename T, typename = std::enable_if_t<
    std::is_same_v<T, std::unique_ptr<MathematicalProgram>> ||
    std::is_same_v<T, MathematicalProgram*> ||
    std::is_same_v<T, const MathematicalProgram*>>>
std::vector<MathematicalProgramResult> SolveInParallel(
    const std::function<T(int64_t, int64_t)>& prog_generator,
    const int64_t range_start, const int64_t range_end,
    const std::function<std::optional<Eigen::VectorXd>(int64_t, int64_t)>&
        initial_guesses_generator =
            [](int64_t, int64_t) {
              return std::nullopt;
            },
    const std::function<std::optional<SolverOptions>(int64_t, int64_t)>&
        solver_options_generator =
            [](int64_t, int64_t) {
              return std::nullopt;
            },
    const std::function<std::optional<SolverId>(int64_t, int64_t)>&
        solver_ids_generator =
            [](int64_t, int64_t) {
              return std::nullopt;
            },
    Parallelism parallelism = Parallelism::Max(), bool dynamic_schedule = false,
    const std::function<void(MathematicalProgram*, MathematicalProgramResult,
                             int64_t, int64_t)>* prog_teardown = nullptr);

The reasoning is that some people may need to generate a unique program for the ith program, so they need their generator to be able to return T = std::unique_ptr<MathematicalProgram>. On the other hand, if we are e.g. implementing DoIsBoundedParallel using this, it is helpful for performance reasons to pre-allocate a vector of programs which the ith thread modifies before solving and then modifies again after solving. This motivates the ability for T = MathematicalProgram*. Finally, if the programs are pre-allcoated as std::vector<const MathematicalProgram*> as in the current implementation of SolveInParallel, we want users to be able to take advantage of having a vector of const pointers.

I like that this allows us to modify the programs in the generator, instead of creating it each time. The other thing that DoIsBoundedParallel would require is a way to exit early -- I don't quite see how we would do that for this method. But I worry that we might be asking too much of just one single function.

I see two options for exiting early.

  1. If prog_generator returns nullptr then we don't solve the program. This would keep the value of result[i].solution_result() == kSolutionResultNotSet
  2. prog_generator could return an empty program, and then the solve is trivial and the return of result[i].solution_result() == kSolutionFound

I prefer option 1.

I like option 1 as well, but prog_generator would need a way to know that we want to exit early. Perhaps a downstream user could implement this with a variable that prog_generator and prog_teardown both have access to via pointers, so prog_teardown could communicate that result to prog_generator? It seems like this would work, but it feels a little iffy style-wise.

I don't see any issue with prog_generator and prog_teardown sharing an atomic state std::atomic<bool> certified_unbounded = false; in the DoIsBoundedParallel

cc @jwnimmer-tri if you have thoughts before I get too far into an implementation.