Lightning-AI/litgpt

combine FSDP with selective activation checkpointing

nemoramo opened this issue · 0 comments

Consider integrating selective activation checkpointing, as featured in PyTorch's blog "Maximizing Training Throughput", into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training of larger models.