Issues
- 0
Any plans for a llama 3 version?
#42 opened by AWAS666 - 0
any performance testing about S-lora ?
#41 opened by x-transformers - 0
Can I get a citation
#40 opened by sabetAI - 0
Failed to run tp branch
#39 opened by sleepwalker2017 - 0
Any advice for debugging this project?
#38 opened by sleepwalker2017 - 0
Get stuck after running benchmark client
#37 opened by sleepwalker2017 - 0
- 3
- 0
Any performance of llama2-series model?
#35 opened by skykiseki - 0
Workaround with GPT2
#34 opened by jannikbuscha - 0
Support qwen?
#33 opened by yinjiaoyuan - 1
When will the Qianwen model be supported?
#26 opened by takemars - 0
OpenAI API webserver compatible
#9 opened by giaosudau - 3
- 1
Multi-GPU Support
#24 opened by luciferlinx101 - 0
Query multiple LoRA by weights
#31 opened by authurlord - 2
Does it support V100 GPU?
#25 opened by Ted8000 - 0
not support baichuan
#29 opened by codernew007 - 0
- 1
Tensor parallelism with S-LoRA
#27 opened by debraj135 - 2
What's the difference from LoRAX
#22 opened by wDevil - 2
Encountered some problems when adding the support for GPT-Q 4-bit quantized LLaMA-2 model.
#18 opened by suilin0432 - 1
ISSUE
#23 opened by sailakkshmiallada - 0
Question about device bandwidth
#21 opened by qizzzh - 2
- 5
This is huge!
#2 opened by yhyu13 - 0
Support chatglm3 ?
#20 opened by litetoooooom - 2
Choosing adapters on inference
#12 opened by raihan0824 - 1
can it support rtx 4090 (24 gb) ?
#11 opened by jaiabhayk - 2
Question about cuda kernel
#10 opened by harryhan618 - 0
Encoder-Decoder model support
#17 opened by aravindMahadevan - 0
Support for GPT-NEOX models
#16 opened by bibekyess - 0
- 1
torch 2.0.1 requires triton==2.0.0
#8 opened by giaosudau - 0
- 0
Quantisation
#6 opened by nivibilla - 1