jhsa26/GradientCheckpointingHorovod
We use gradient checkpointing technique and Horovod to jointly accelerate training large neural networks such as Bert, Roberta, and Xlnet.
Python
No issues in this repository yet.
We use gradient checkpointing technique and Horovod to jointly accelerate training large neural networks such as Bert, Roberta, and Xlnet.
Python
No issues in this repository yet.