Lazy initialization of massive models

Question

Lazy initialization of massive models

xrsrke opened this issue a year ago · 1 comments

Initialize a model that takes no host or CPU memory (for cases where the model is larger than the host memory)
Replay the operations that were played while initializing a model or a partition of the model

APIs

from pipegoose.utils import lazy_init

# load the model from `transformers`

with lazy_init(parallel_context):
		model = TensorParallel(model, parallel_context).parallelize()
		model = PipelineParallel(model, parallel_context).parallelize()
		model = DataParallel(model, parallel_context).parallelize()

logits = model(inputs)

Reading

Current best practices to initialize massive (50B parameter+) models #16944 [[link]](Lightning-AI/pytorch-lightning#16944)
LazyTensor: combining eager execution with domain-specific compilers [link]
Initialize a model with 100 billions parameters in no time and without using any RAM [[link]](https://huggingface.co/docs/accelerate/v0.11.0/en/big_modeling)
Section 3.1 Model Initialization, in PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel [link]

Answer 1 · 2023-10-29T09:20:04.000Z

Hello, please assign this to me.