parasj/checkmate

Transformer benchmark

eric-haibin-lin opened this issue · 1 comments

I noticed that most results in the paper are computer vision models. Is there any result for transformer based models (e.g. transformer, GPT-2, BERT)? thanks!

Hi @eric-haibin-lin, thanks for checking out the project! Apologies about the delay.

We did not include BERT results in this repository, but saw a 2.3x memory reduction when training a BERT model with Checkmate optimizations (at 1x extra overhead for rematerialization). The PR used to evaluate this is located at #62.

Please let me know if there was a specific model you were considering.

Thanks!
Paras