dvmazur/mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

PythonMIT

Issues

hqq_aten package not installed.
#21 opened a year ago by LeMoussel
2
Hard to benchmark the operation in the repo
#39 opened 4 months ago by mynotwo
1
Mixtral Instruct tokenizer from Colab notebook doesn't work.
#38 opened 6 months ago by jmuntaner-smd
2
Is it possible to finetune this on a custom dataset?
#17 opened a year ago by asmith26
8
Change of query weight matrices shapes
#37 opened 7 months ago by avani17101
0
Support DeepSeek V2 model
#36 opened 7 months ago by Minami-su
0
Having issue loading my HQQ quantized model
#35 opened 8 months ago by BeichenHuang
0
How to split the model parameter safetensors file into multiple small files
#34 opened 8 months ago by YLSnowy
5
Implementation of benchmarks (C4 perplexity, Wikitext perplexity)
#33 opened 8 months ago by ChengSashankh
0
runtimeerror when nbit = 4 and group_size =64
#32 opened 8 months ago by Eutenacity
0
(Colab) Clear GPU RAM usage after running the generation code without restarting instance
#28 opened 9 months ago by ScottLinnn
1
Can this be used for Jambo inference
#30 opened 9 months ago by freQuensy23-coder
1
Trition Issues in Running the Code Locally
#31 opened 9 months ago by amangupt01
1
Can it run on multi-GPU?
#15 opened a year ago by drdh
10
a strange issue with default parameters " RuntimeError about memory"
#26 opened 9 months ago by a1564740774
0
Run on second GPU (torch.device("cuda:1"))
#24 opened a year ago by imabot2
1
Update Requirements.txt
#25 opened 10 months ago by Soumadip-Saha
0
4bit-3bit model produces gibberish when plugged into demo
#23 opened a year ago by jjblum
0
Run without quantization
#22 opened a year ago by freQuensy23-coder
9
need mixtral offload for NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
#19 opened a year ago by githubpradeep
0
CUDA OOM errors in wsl2
#18 opened a year ago by MrNova111
0
Can it run with LlamaIndex?
#16 opened a year ago by LeMoussel
0
How to use the offloading in my MoE model?
#13 opened a year ago by WangRongsheng
4
Doesn't work
#14 opened a year ago by SanskarX10
10
Session crashed on colab
#7 opened a year ago by bitsnaps
4
Mixtral OffLoading/GGUF/ExLlamaV2, which approach to use?
#11 opened a year ago by LeMoussel
1
exl2
#4 opened a year ago by eramax
2
Enhancing the Efficacy of MoE Offloading with Speculative Prefetching Strategies
#10 opened a year ago by yihong1120
0