combine FSDP with selective activation checkpointing
nemoramo opened this issue · 0 comments
nemoramo commented
Consider integrating selective activation checkpointing, as featured in PyTorch's blog "Maximizing Training Throughput", into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training of larger models.