Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?
drxmy opened this issue · 0 comments
drxmy commented
I find that microsoft's Megatron-DeepSpeed has such feature(microsoft/Megatron-DeepSpeed#56). It is a relatively new PR. I am not sure whether bigscience merge it or not.