Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3
R0n12 opened this issue · 1 comments
R0n12 commented
Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.
Describe the solution you'd like
A simple fix like this will do the trick inside get_model
if neox_args.zero_stage == 3:
with deepspeed.zero.Init():
model = GPT2ModelPipe(
neox_args=neox_args,
num_tokentypes=0,
parallel_output=True,
topology=mpu.get_topology(),
use_cache=use_cache,
)
Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed
Additional context
Related issue: huggingface/accelerate#922
R0n12 commented
I am working on a branch addressing this issue