bigscience-workshop/Megatron-DeepSpeed

Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?

drxmy opened this issue · 0 comments

drxmy commented

I find that microsoft's Megatron-DeepSpeed has such feature(microsoft/Megatron-DeepSpeed#56). It is a relatively new PR. I am not sure whether bigscience merge it or not.