The predict-then-optimize framework arises in a wide variety of applications where the unknown cost coefficients of an optimization problem are first predicted based on contextual features and then used to solve the problem. In this work, we extend the predict-then-optimize framework to a multi-task setting: contextual features must be used to predict cost coefficients of multiple optimization problems simultaneously. We propose a set of methods for this setting that enable information sharing between tasks for improved learning, particularly in the small-data regime. Our experiments demonstrate that multi-task predict-then-optimize methods provide good tradeoffs in performance among different tasks, particularly with less training data and more tasks.
- Graph Routing: Traveling salesperson problem dataset generated from PyEPO. (Docs)
- Warcraft Terrain Images: The Warcraft map shortest path dataset, which we modify the cost coefficients for different species. (Download)
Strategy | Description |
---|---|
mse | Two-stage method, training to minimize costs mean squared error |
separated | Separated single-task learning for each task |
separated+mse | Separated single-task learning for each task with the costs mean squared error |
comb | A uniform combination of decision losses |
comb+mse | A uniform combination of decision losses and the costs mean squared error |
gradnorm | GradNorm for decision losses |
gradnorm+mse | GradNorm for decision losses and the costs mean squared error |
Sample code to run experiments and save results and figures is as follows:
python pipeline.py --data 100 --n_sp 3 --n_tsp 1 --algo spo
- node: Number of nodes in graph.
- data: Training data size.
- feat: Input feature size.
- noise: Noise half-width.
- n_sp: Number of the shortest path tasks.
- n_tsp: Number of the traveling salesperson tasks.
- algo: Training algorithm: spo for SPO+ and pfyl for PFYL.
- iter: Max training iterations.
- batch: Batch size.
- lr: Learning rate.
- lr2: Learning rate of loss weights (only for GradNorm).
- alpha: Hyperparameter of restoring force (only for GradNorm).
- proc: number of processor for optimization.
python pipeline_wsp.py --data 100 --algo spo
- k: Size of grid network.
- data: Training data size.
- algo: Training algorithm: spo for SPO+ and pfyl for PFYL.
- iter: Max training iterations.
- batch: Batch size.
- lr: Learning rate.
- lr2: Learning rate of loss weights (only for GradNorm).
- alpha: Hyperparameter of restoring force (only for GradNorm).
- proc: number of processor for optimization.
The following figures can be ploted using Jupyter Notebooks.
Multi-task predict-then-optimize has a performance advantage over single-task, especially with GradNorm.
The use of GradNorm to adjust the weights dynamically allows for efficient model training as faster convergence is achieved.
The multi-task predict-then-optimize framework is particularly effective in the small-data regime. The performance of the separated single-task model gradually improves and may even surpass multi-task learning as the amount of training data increases.
The increasing related tasks improves the model performance, especially for complicated tasks such as TSP.
We extend the end-to-end predict-then-optimize framework to multi-task learning, which jointly minimizes decision error for related optimization tasks. Our results demonstrate the benefits of this approach, including improved performance with less training data and the ability to handle multiple tasks simultaneously. Future work in this area could include the application of this method to real-world problems, as well as further exploration of techniques for multi-task learning, such as current and novel multi-task neural network architectures and gradient calibration methods.