/GradientCheckpointingHorovod

We use gradient checkpointing technique and Horovod to jointly accelerate training large neural networks such as Bert, Roberta, and Xlnet.

Primary LanguagePython

No issues in this repository yet.