
Primary LanguageJupyter Notebook


Preprocessing Strategy Optimizer is a library for TensorFlow that automates the generic pipelines’ profiling process.

Published at SIGMOD '22 Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

author = {Isenko, Alexander and Mayer, Ruben and Jeffrey, Jedele and Jacobsen, Hans-Arno},
title = {Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines},
year = {2022},
isbn = {9781450392495},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3514221.3517848},
booktitle = {Proceedings of the 2022 International Conference on Management of Data},
numpages = {15}


How to use Presto for your own experiments

There are basically two things needed:

  • A definition of your pipeline (take imagenet_pipeline.py as an example)
  • Generate the different strategies based on the pipeline, see imagenet_demo.py
  • Run the experiments with .profile_strategy(...) with all the possible configuration options (see presto/strategy.py as reference)
  • Load the logs, saved as pd.Dataframes into a StrategyAnalysis and analyze the data with, e.g., a weighted_summary(...) call
  • The scores are presorted, you can decide on which strategy you want to decide, or automatically pick the highest one. The corresponding strategy is a valid tf.data.Dataset pipeline, so you can just reuse it in your pipeline.