Lazy initialization of massive models
xrsrke opened this issue · 1 comments
xrsrke commented
- Initialize a model that takes no host or CPU memory (for cases where the model is larger than the host memory)
- Replay the operations that were played while initializing a model or a partition of the model
APIs
from pipegoose.utils import lazy_init
# load the model from `transformers`
with lazy_init(parallel_context):
model = TensorParallel(model, parallel_context).parallelize()
model = PipelineParallel(model, parallel_context).parallelize()
model = DataParallel(model, parallel_context).parallelize()
logits = model(inputs)
Reading
- Current best practices to initialize massive (50B parameter+) models #16944 [[link]](Lightning-AI/pytorch-lightning#16944)
- LazyTensor: combining eager execution with domain-specific compilers [link]
- Initialize a model with 100 billions parameters in no time and without using any RAM [[link]](https://huggingface.co/docs/accelerate/v0.11.0/en/big_modeling)
- Section 3.1 Model Initialization, in PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel [link]
createsmit7 commented
Hello, please assign this to me.