[New Feature] Will MLA Be Supported?
RanchiZhao opened this issue · 0 comments
RanchiZhao commented
- We've successfully applied W4A8KV4 quantization using QoQ to an MLA-based model, kind of like DeepSeek v2, but we built it from scratch. We then tested it out on both perplexity metrics and a bunch of benchmarks, and it performed really well.
- Now, we're looking to actually speed things up with this W4A8KV4 model.
Any Helps or Suggestions? We are looking forward to your replies!