Issues
- 2
hqq_aten package not installed.
#21 opened by LeMoussel - 1
Hard to benchmark the operation in the repo
#39 opened by mynotwo - 2
- 8
- 0
Change of query weight matrices shapes
#37 opened by avani17101 - 0
Support DeepSeek V2 model
#36 opened by Minami-su - 0
Having issue loading my HQQ quantized model
#35 opened by BeichenHuang - 5
- 0
- 0
runtimeerror when nbit = 4 and group_size =64
#32 opened by Eutenacity - 1
(Colab) Clear GPU RAM usage after running the generation code without restarting instance
#28 opened by ScottLinnn - 1
Can this be used for Jambo inference
#30 opened by freQuensy23-coder - 1
Trition Issues in Running the Code Locally
#31 opened by amangupt01 - 10
Can it run on multi-GPU?
#15 opened by drdh - 0
- 1
Run on second GPU (torch.device("cuda:1"))
#24 opened by imabot2 - 0
Update Requirements.txt
#25 opened by Soumadip-Saha - 0
- 9
Run without quantization
#22 opened by freQuensy23-coder - 0
- 0
CUDA OOM errors in wsl2
#18 opened by MrNova111 - 0
Can it run with LlamaIndex?
#16 opened by LeMoussel - 4
How to use the offloading in my MoE model?
#13 opened by WangRongsheng - 10
Doesn't work
#14 opened by SanskarX10 - 4
Session crashed on colab
#7 opened by bitsnaps - 1
- 2
- 0
Enhancing the Efficacy of MoE Offloading with Speculative Prefetching Strategies
#10 opened by yihong1120