Gradient Checkpointing

Question

Gradient Checkpointing

xrsrke opened this issue a year ago · 1 comments

Selectively recompute the forward pass of some operations in the backward pass to save memory.
Replace transformers's gradient checkpointing with pipegoose's gradient checkpointing.

APIs

import pipegoose.utils.checkpointing import Checkpointing

mlp = model.transformer.blocks[0].mlp
mlp = Checkpointing(mlp, parallel_context)

outputs = mlp(inputs)

Reading

https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html
Reducing Activation Recomputation in Large Transformer Models [[link]](https://arxiv.org/abs/2205.05198)

Answer 1 · 2023-12-05T07:13:32.000Z

I will do it.
!assign