Issues
- 0
FastAPI + llamapi issue
#29 opened by Samraw003 - 0
Stopped working after enabling CUDA
#28 opened by alexellis - 0
High RAM and CPU usage
#27 opened by delta-whiplash - 4
warning: failed to mlock 245760-byte buffer (after previously locking 0 bytes): Cannot allocate memory llm_load_tensors: mem required = 46494.72 MB (+ 1280.00 MB per state)
#13 opened by Dougie777 - 2
Proxy to openAI
#9 opened by kreolsky - 0
Usage of embedding through langchain
#26 opened by jordandroid - 1
how to run this api in cpu only mode
#23 opened by delta-whiplash - 2
- 0
Support min_p sampler
#25 opened by atisharma - 0
How can I use a specific prompt template?
#24 opened by Dougie777 - 0
- 1
exllama GPU split
#21 opened by atisharma - 2
Support for ExLlama V2
#15 opened by Immortalin - 1
- 2
Set number of cores being used on cpu?
#16 opened by Dougie777 - 5
Long generations dont return data but server says 200 OK. Swagger screen just says LOADING forever.
#18 opened by Dougie777 - 2
BUG: I found the model path bug!
#17 opened by Dougie777 - 3
- 3
model_definitions.py
#12 opened by Dougie777 - 1
- 2
Dumb question: definitions.py model parameters
#10 opened by Dougie777 - 1
Using with LangChain instead openai API
#8 opened by kreolsky