bigscience-workshop/Megatron-DeepSpeed

Is there any script for pretraining/funting Bloom?

drxmy opened this issue · 0 comments

drxmy commented

Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp

In my understanding, this script should be able to load bloom with some change, for example add "--position-embedding-type alibi" . I have done some experiment, but it keeps failing.

Really appreciated it if someone could give me some advice!