dvmazur/mixtral-offloading

Support DeepSeek V2 model

Opened this issue · 0 comments

DeepSeek V2 is a state-of-the-art moe model. Are there any plans to support this model?