Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3

Question

Large model instantiation using `DeepSpeed.zero.Init` under ZeRO-3

R0n12 opened this issue 3 months ago · 1 comments

Is your feature request related to a problem? Please describe.
Currently GPT-NeoX doesn't support partitioned model initialization when using ZeRO-3, which will cause OOM error in most cases.

Describe the solution you'd like
A simple fix like this will do the trick inside get_model

    if neox_args.zero_stage == 3:
        with deepspeed.zero.Init():
            model = GPT2ModelPipe(
                neox_args=neox_args,
                num_tokentypes=0,
                parallel_output=True,
                topology=mpu.get_topology(),
                use_cache=use_cache,
            )

Describe alternatives you've considered
Other things I have in mind is to figure out a way to properly test this, I have tested this on a 175B model and it works. Please let me know if there's other testing needed

Additional context
Related issue: huggingface/accelerate#922

Answer 1 · 2024-03-18T07:35:10.000Z

I am working on a branch addressing this issue