predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

PythonApache-2.0

Pinned issues

Project Roadmap

#57 opened a year ago by tgaddair

Open33

Issues

Phi 3.5 vision (4B model)
#637 opened a month ago by CheeseAndMeat
2
Not able to run source code
#636 opened a month ago by nirvitarka
0
Quickstart example not working
#489 opened 6 months ago by jmorenobl
4
flashinfer backend raises RuntimeError: paged_kv_indices must be a 1D tensor
#625 opened 2 months ago by baggiponte
3
Flash Attention is not installed?
#595 opened 2 months ago by ObliviousDonkey
8
RuntimeError: CUDA error: no kernel image is available for execution on the device
#535 opened 5 months ago by nethi
1
Performance issues on AWQ and Lora
#611 opened 2 months ago by dumbPy
0
Issue with loading AWQ quantized Llama 3.1 70B
#607 opened 2 months ago by dumbPy
1
Can not start lorax from docker
#605 opened 2 months ago by korlin0110
1
Running several adapters on the same input
#606 opened 2 months ago by arnaud-secondlayer
0
seems like when max total token is so huge like 130000, and in the request if there is no max new token the response will be wrong
#601 opened 2 months ago by ejiang-eog
0
Fail to run server with prefix-caching option
#599 opened 2 months ago by prd-tuong-nguyen
0
Cannot start Phi-3-mini-128k-instruct from Docker
#550 opened 3 months ago by annadmitrieva
7
Issues loading Llama 3.1 8B Instruct
#592 opened 3 months ago by jonseaberg
8
The server is failing to run
#591 opened 3 months ago by u650080
0
if LoRAX is based on punica kernels will it be able to support LoRA Adapters for Mistral NeMO 12B?
#549 opened 4 months ago by tensimixt
1
Important: In latest main, the server can not serve more than 1 user
#512 opened 5 months ago by prd-tuong-nguyen
3
Add Support for AutoModelForSequenceClassification Models
#509 opened 5 months ago by akkky02
1
Passing a `--revision` causes failure in loading tokenizer config
#563 opened 4 months ago by chiragjn
0
Support Lora Adapter generated from mistral-finetune
#546 opened 4 months ago by tensimixt
1
docker image error
#556 opened 4 months ago by ejiang-eog
2
LORAX_USE_GLOBAL_HF_TOKEN is not applied at the first time of calling adapter from huggingface private hub
#541 opened 4 months ago by monologg
0
Stop word is included on phi-2
#537 opened 4 months ago by yunmanger1
0
Fails hard on CUDA error
#523 opened 5 months ago by yunmanger1
7
Hi Guys, now that TGI is back under Apache-2.0 license, will lorax merge their updates?
#527 opened 5 months ago by SMAntony
0
Adding Whisper model
#526 opened 5 months ago by Jeevi10
0
Generating garbage output
#521 opened 5 months ago by shreyansh26
2
Add echo parameter in request
#518 opened 5 months ago by dennisrall
0
Why are qlora (4bit) and lora (16bit) adapter file sizes the same?
#504 opened 5 months ago by codybum
1
can't start my local llama3 model server with docker
#511 opened 5 months ago by cheney369
0
Fail to load special token in phi-3
#505 opened 5 months ago by prd-tuong-nguyen
0
can't run lorax with docker.
#502 opened 5 months ago by cheney369
1
Fail to run Phi-3
#485 opened 5 months ago by prd-tuong-nguyen
9
AssertionError when using model "google/gemma-2b" with multi-gpus
#500 opened 5 months ago by tritct
0
When caching adapters, cache the adapter ID + the API token pair
#479 opened 6 months ago by noyoshi
4
AutoTokenzier.from_pretrains needs setting with `trust_remote_code` inside `load_module_map`
#466 opened 6 months ago by thincal
2
Quantized KV Cache
#483 opened 6 months ago by flozi00
0
Retrieve all lora models from Huggingface hub by base model setting.
#463 opened 6 months ago by svjack
2
`make install` insufficient for running llama3-8B-Instruct
#484 opened 6 months ago by fozziethebeat
4
Ensure api_token is not included in the response on error
#469 opened 6 months ago by tgaddair
3
Bug Report: lorax-launcher failed with --source "s3" for model_id "mistralai/Mistral-7B-Instruct-v0.2"
#473 opened 6 months ago by donjing
1
Support inference on INF2 instance
#477 opened 6 months ago by prd-tuong-nguyen
0
Reject unknown fields from API requests
#478 opened 6 months ago by noyoshi
0
Add HTTP status codes to docs
#481 opened 6 months ago by noyoshi
1
Improve warmup checking for max new tokens when using speculative decoding
#474 opened 6 months ago by tgaddair
0
[QUESTION] How to change HuggingFace model download Path in Lorax When deployed to Kubernetes through HelmChart
#470 opened 6 months ago by fahimkm
1
Add HF authentication instructions to lorax-launcher docs
#451 opened 6 months ago by JamsheedMistri
6
Add all launcher args as optional in the Helm charts
#465 opened 6 months ago by tgaddair
0
Improve async load for adapters to avoid main thread lockups in server
#457 opened 7 months ago by tgaddair
0
Batch inference endpoint (OpenAI compatible)
#448 opened 7 months ago by tgaddair
0