Issues
- 8
Qwen 2 inference problem
#493 opened by Sadeghi85 - 1
After updating exllamav2 to 0.1.0, the text generated by exui will not be pushed verbatim
#490 opened by xldistance - 2
Running humaneval against llama-3-8b-instruct exl2 quant results in a silent OOM when samples per task > 7
#496 opened by LlamaEnjoyer - 1
EXL2 format spec?
#494 opened by polarathene - 3
"Loading exllamav2_ext extension (JIT)... Building C++/CUDA extension" hangs forever
#495 opened by AgeOfAlgorithms - 3
Phi-3 medium generation issue
#474 opened by rjmehta1993 - 5
- 7
- 17
- 2
Qauntization in glm4-9b failed
#489 opened by Orion-zhen - 3
Can I not use flash attention? Because the model needs to be deployed to nvidia T4
#480 opened by vikotse - 4
DeepSeek V2 support
#443 opened by SinanAkkoyun - 5
Support MiniCPM architecture
#479 opened by meigami0 - 2
Quick, Non-Data-Driven Quantization
#482 opened by alexbrowngh - 1
v0.1.1 multi-gpu issue (fine in v0.0.21)
#483 opened by surenchl - 2
v0.1.3 lm format enforcer broken
#485 opened by waterangel91 - 6
- 3
[feature request] LLAMA.CPP
#476 opened by 0wwafa - 1
module 'exllamav2_ext' has no attribute 'count_match'
#481 opened by abpani - 3
Problem with blinker...
#475 opened by 0wwafa - 4
- 9
Command-R plus OOM 0.0.18 -> 0.0.19
#465 opened by kennylin0309 - 4
Dynamic gen is slower?!
#469 opened by Ph0rk0z - 3
- 3
- 2
quantization fails while writing shards
#472 opened by theyunt - 2
ROCm version 0.1.0, getting errors
#467 opened by hvico - 4
ExLlamaV2StreamingGenerator error
#451 opened by nktice - 0
Using ExLlamaV2 with Phi-3-Vision
#464 opened by CyberTimon - 5
Integration with Hugging Face transformers library
#461 opened by SunMarc - 7
Error when trying to quantize Viking-7B
#459 opened by minipasila - 1
max_attention_size should be max_input_len**2 ?
#458 opened by laoda513 - 1
undefined symbol: _ZN3c104cuda9SetDeviceEi
#457 opened by icivi - 2
- 2
Installing exllama falied
#448 opened by freQuensy23-coder - 1
- 11
Update from 0.0.19 to 0.0.20 with Python 3.11, torch 2.2.1 and CUDA 12.1: DLL load failed while importing exllamav2_ext: The specified procedure could not be found.
#434 opened by acidbubbles - 2
Integration with txtai for RAG
#444 opened by edwardsmith999 - 1
Issue with dolphin mixtral8x22b
#445 opened by luijait - 0
Control Vectors
#442 opened by acidbubbles - 1
config.py
#439 opened by Huzaif2309 - 1
[question] how to make generation determinsitic?
#438 opened by yshui - 2
Quantized LLama3 inference not working
#435 opened by BenjaminGantenbein - 3
Qwen-110B quantize failed, RuntimeError: CUDA error: an illegal memory access was encountered
#433 opened by buliaoyin - 3
ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa
#427 opened by rjmehta1993 - 0
Layer Skip looks interesting
#431 opened by SinanAkkoyun - 8
Phi-3 Support
#425 opened by candre23 - 2
FP16 + ROCm Possibly Subpar Performance
#428 opened by Beinsezii - 4
Piece ID is out of range.
#422 opened by Ph0rk0z - 2
Error when trying to quantize Llama3 70B instruct: module 'exllamav2_ext' has no attribute 'sim_anneal'
#424 opened by RodriMora