[New Feature] Is Mixtral supported?
Opened this issue · 1 comments
Can you confirm if Mixtral is currently supported, e.g., mistralai/Mixtral-8x7B-Instruct-v0.1?
I saw in another issue that Mistral is supported, but I'm not sure about Mixtral-8x7B since it's a different architecture.
Thanks for your interest in LMFlow! We have tested Mixtral-8x7B in A40 (48G)*8 servers, so the dense training of mixtral-8x7B is currently supported in LMFlow. Sparse training is still under implementation, which we will add to our roadmap and schedule the implementation soon. Multi-node (https://github.com/OptimalScale/LMFlow/blob/main/readme/multi_node.md) can be utilized for larger model training such as Mixtral-8x22B, but we haven't yet tested models that large.
Hope this information can be helpful 😄