turboderp/exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

PythonMIT

Issues

Qwen 2 inference problem
#493 opened 6 days ago by Sadeghi85
8
After updating exllamav2 to 0.1.0, the text generated by exui will not be pushed verbatim
#490 opened 7 days ago by xldistance
1
Running humaneval against llama-3-8b-instruct exl2 quant results in a silent OOM when samples per task > 7
#496 opened 5 days ago by LlamaEnjoyer
2
EXL2 format spec?
#494 opened 6 days ago by polarathene
1
"Loading exllamav2_ext extension (JIT)... Building C++/CUDA extension" hangs forever
#495 opened 5 days ago by AgeOfAlgorithms
3
Phi-3 medium generation issue
#474 opened 16 days ago by rjmehta1993
3
Addition of DRY: A modern repetition penalty that reliably prevents looping
#447 opened a month ago by awtrisk
5
Error trying to quantize cognitivecomputations/dolphin-2.9.1-qwen-110b
#453 opened 6 days ago by bablat
7
LM Enforcer cause hanged generation and what is the Sampler setting
#486 opened 8 days ago by waterangel91
17
Qauntization in glm4-9b failed
#489 opened 8 days ago by Orion-zhen
2
Can I not use flash attention? Because the model needs to be deployed to nvidia T4
#480 opened 13 days ago by vikotse
3
DeepSeek V2 support
#443 opened a month ago by SinanAkkoyun
4
Support MiniCPM architecture
#479 opened 13 days ago by meigami0
5
Quick, Non-Data-Driven Quantization
#482 opened 12 days ago by alexbrowngh
2
v0.1.1 multi-gpu issue (fine in v0.0.21)
#483 opened 10 days ago by surenchl
1
v0.1.3 lm format enforcer broken
#485 opened 11 days ago by waterangel91
2
Non stop generation after update to v0.1.1.0 and latest flash attention
#484 opened 11 days ago by waterangel91
6
[feature request] LLAMA.CPP
#476 opened 12 days ago by 0wwafa
3
module 'exllamav2_ext' has no attribute 'count_match'
#481 opened 12 days ago by abpani
1
Problem with blinker...
#475 opened 15 days ago by 0wwafa
3
Convert.py quantization abruptly failing without errors
#466 opened 15 days ago by engadine1997
4
Command-R plus OOM 0.0.18 -> 0.0.19
#465 opened 20 days ago by kennylin0309
9
Dynamic gen is slower?!
#469 opened 17 days ago by Ph0rk0z
4
What does the implemention of ’segmenting input‘ in the exllamav2 called？
#470 opened 17 days ago by laoda513
3
Cannot load Llama-3 8B Instruct, incompatible function arguments
#468 opened 16 days ago by nickpotafiy
3
quantization fails while writing shards
#472 opened 16 days ago by theyunt
2
ROCm version 0.1.0, getting errors
#467 opened 18 days ago by hvico
2
ExLlamaV2StreamingGenerator error
#451 opened 18 days ago by nktice
4
Using ExLlamaV2 with Phi-3-Vision
#464 opened 21 days ago by CyberTimon
0
Integration with Hugging Face transformers library
#461 opened 23 days ago by SunMarc
5
Error when trying to quantize Viking-7B
#459 opened 25 days ago by minipasila
7
max_attention_size should be max_input_len**2 ?
#458 opened a month ago by laoda513
1
undefined symbol: _ZN3c104cuda9SetDeviceEi
#457 opened a month ago by icivi
1
what does `make_sequential` do when using gptq inference?
#454 opened a month ago by sleepwalker2017
2
Installing exllama falied
#448 opened a month ago by freQuensy23-coder
2
Scaling inference throughput when increasing the batch size
#450 opened a month ago by lopuhin
1
Update from 0.0.19 to 0.0.20 with Python 3.11, torch 2.2.1 and CUDA 12.1: DLL load failed while importing exllamav2_ext: The specified procedure could not be found.
#434 opened 2 months ago by acidbubbles
11
Integration with txtai for RAG
#444 opened a month ago by edwardsmith999
2
Issue with dolphin mixtral8x22b
#445 opened a month ago by luijait
1
Control Vectors
#442 opened a month ago by acidbubbles
0
config.py
#439 opened a month ago by Huzaif2309
1
[question] how to make generation determinsitic?
#438 opened a month ago by yshui
1
Quantized LLama3 inference not working
#435 opened a month ago by BenjaminGantenbein
2
Qwen-110B quantize failed, RuntimeError: CUDA error: an illegal memory access was encountered
#433 opened a month ago by buliaoyin
3
ImportError: /home/ec2-user/.cache/torch_extensions/py310_cu121/exllamav2_ext/exllamav2_ext.so: undefined symbol: _ZN3c104cuda14ExchangeDeviceEa
#427 opened 2 months ago by rjmehta1993
3
Layer Skip looks interesting
#431 opened 2 months ago by SinanAkkoyun
0
Phi-3 Support
#425 opened 2 months ago by candre23
8
FP16 + ROCm Possibly Subpar Performance
#428 opened 2 months ago by Beinsezii
2
Piece ID is out of range.
#422 opened 2 months ago by Ph0rk0z
4
Error when trying to quantize Llama3 70B instruct: module 'exllamav2_ext' has no attribute 'sim_anneal'
#424 opened 2 months ago by RodriMora
2