How to use multiple GPUs for model parallel training

Question

How to use multiple GPUs for model parallel training

zhihui-shao opened this issue a year ago · 5 comments

Hi, will you release a method for model parallel training of multiple GPUs

Answer 1 · 2023-09-19T23:07:56.000Z

Hey, I'm not the author but use Accelerate and Deespeed config without FP16. Underneath the hood it uses HF Trainer.

Answer 2 · 2023-09-27T16:54:10.000Z

@infosechoudini Thanks for chiming in. Yes, the purpose of this repo is to make it HuggingFace compatible, so please do try HF Trainer :)

I've had a few issues with FP16 (numerical stability), which is also noted in the official implementation, so would stick to FP32 for now.

I am working on porting the official implementation to HF, which is almost finished except for chunkwise forward, and just need a few tests and debugging. It has some tricks for stability, which may enable FP16 :)

Answer 3 · 2023-09-27T17:06:45.000Z

YAY for FP16!! I've been working on it on my side as well... no luck tho

Answer 4 · 2023-10-03T16:50:39.000Z

Update: The official code updates are now in main. It is on par with the original implementation in terms of forward, weight naming, and backward gradients :) (check tests/) One thing is that I would recommend bf16 over fp16, since I personally tested with bf16 only and can confirm that it is stable.

I can also confirm that this model can be trained stably with parallelism, such as fsdp.

Answer 5 · 2023-10-05T15:58:46.000Z

Thanks!!!! You're awesome!