Question about profile-based cost model

Thanks for opening source such great work. I have a question about the cost model mentioned in your paper.

Costs are determined prior to MILP construction by profiling network layers on target hardware with random inputs across a range of batch sizes and input shapes, and exclude static graph construction and input generation time.

So I wonder if there is a mathematical formula to estimate the cost? Or is the model a neural network?

Thanks!

Hi @vycezhong,

By default, we estimate the number of FLOPS (floating point operations) per layer. This correlates with runtime but is not a perfect measure. In our paper's experiments, therefore, we actually profile graphs on a GPU to measure their forward pass time. We then estimate the backward pass runtime is 2 times the forward pass time. We also compute profiles for models on AWS machines (p2.xlarge, p3.2xlarge). To profile the scripts, we use the following command: https://gist.github.com/parasj/09854d18421652c8fa24374436b423b5

In our paper replication branch (mlsys_artifact), we automatically download the correct profiles for each benchmarked neural network from an S3 bucket we serve.

Thanks,
Paras

Hi, I also have a question about the cost model. It seems the cost is currently set to 1 for all the layers?

checkmate/checkmate/tf2/extraction.py

Line 41 in d09d442

    
           gb.add_node(op.name, cpu_cost=1, ram_cost=op_ram_cost, backward=op.name in grad_nodes)

Hi @LiuXiaoxuanPKU,

Please see this code for the correct way to load profiles. https://github.com/parasj/checkmate/blob/mlsys20_artifact/experiments/common/profile/cost_model.py

This is located in the mlsys20_artifact branch