Issues
- 1
Inference Multiple GPU
#20 opened - 0
ggml-alpaca-3b-q4 on CPU?
#19 opened - 2
- 0
Exception: cublasLt ran into an error!
#16 opened - 2
Repetitive responses from 7B model.
#15 opened - 1
- 5
Problem trying to run the 13B model
#13 opened - 3
top_k and repetition_penalty
#12 opened - 3
Is there something I am doing wrong?
#10 opened - 4
How to support multiple users
#9 opened - 1
401 error
#8 opened - 7
Streaming response
#7 opened - 0
- 8
- 2
- 3
How to run this serve?
#1 opened