How to perform inference MoE model with expert parallel

Question

How to perform inference MoE model with expert parallel

Guodanding opened this issue 16 days ago · 1 comments

Hello, I want to perform inference on the HuggingFace MoE model Qwen1.5-MoE-A2.7B with expert parallelism using DeepSpeed in a multi-GPU environment. However, the official tutorials are not comprehensive enough, and despite reviewing the documentation, I still don't know how to proceed.

Could you please help me refine this request?