Question about profile-based cost model
jasperzhong opened this issue · 3 comments
Thanks for opening source such great work. I have a question about the cost model mentioned in your paper.
Costs are determined prior to MILP construction by profiling network layers on target hardware with random inputs across a range of batch sizes and input shapes, and exclude static graph construction and input generation time.
So I wonder if there is a mathematical formula to estimate the cost? Or is the model a neural network?
Thanks!
Hi @vycezhong,
By default, we estimate the number of FLOPS (floating point operations) per layer. This correlates with runtime but is not a perfect measure. In our paper's experiments, therefore, we actually profile graphs on a GPU to measure their forward pass time. We then estimate the backward pass runtime is 2 times the forward pass time. We also compute profiles for models on AWS machines (p2.xlarge, p3.2xlarge). To profile the scripts, we use the following command: https://gist.github.com/parasj/09854d18421652c8fa24374436b423b5
In our paper replication branch (mlsys_artifact), we automatically download the correct profiles for each benchmarked neural network from an S3 bucket we serve.
Thanks,
Paras
Hi, I also have a question about the cost model. It seems the cost is currently set to 1 for all the layers?
checkmate/checkmate/tf2/extraction.py
Line 41 in d09d442
Hi @LiuXiaoxuanPKU,
Please see this code for the correct way to load profiles. https://github.com/parasj/checkmate/blob/mlsys20_artifact/experiments/common/profile/cost_model.py
This is located in the mlsys20_artifact branch