runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

PythonMIT

Issues

'NoneType' object has no attribute 'headers' (completions endpoint)
#104 opened a month ago by Permafacture
7
Support for tools / tool_choice="auto" in OpenAI-compatible API
#85 opened 2 months ago by TimPietrusky
23
Meta-Llama-3.1-8B support
#97 opened 2 months ago by klipach
2
0.5.5 is out
#105 opened a month ago by the-xentropy
1
Correct way to implement RAG with vllm
#103 opened a month ago by Hel1zor
0
Documentation incorrect regarding boolean
#101 opened a month ago by scriptcoded
0
MODEL_REVISION & TOKENIZER_REVISION: Both are needed to configure the revision
#100 opened a month ago by TimPietrusky
0
Bitsandbytes support
#99 opened a month ago by ilyalasy
0
trust_remote_code Setting Not Applied in runpod/worker-v1-vllm:stable-cuda12.1.0
#91 opened 2 months ago by Juhong-Namgung
5
Support GGUF models
#98 opened 2 months ago by vladfaust
0
Issue: Update VLLM to Version .5.0++, and a few suggestions
#83 opened 3 months ago by nerdylive123
13
ValueError: rope_scaling must be a dictionary with two fields, type and factor
#89 opened 2 months ago by omar93939
2
Update to vllm 0.5
#80 opened 2 months ago by Sapessii
0
Using mistral 0.3
#79 opened 2 months ago by Sapessii
0
Unable to deploy mistralai/Mistral-Nemo-Instruct-2407
#88 opened 2 months ago by TheMindExpansionNetwork
5
[feat] ability to set max_num_seqs
#87 opened 2 months ago by kalocide
1
A new version of VLLM has been released
#84 opened 3 months ago by d4rk6un
1
Update documentation to note support for extra parameters
#69 opened 5 months ago by bryankruman
1
Gemma-2 is not available in this docker image.
#81 opened 3 months ago by codingchild2424
0
OOM on second request
#78 opened 3 months ago by Permafacture
2
ImportError prepare_hf_model_weights method
#73 opened 4 months ago by ArtyoMKos
2
Runpod serverless vLLM with Llama 3 70B on 40GB GPU
#68 opened 4 months ago by EdwardTheLegend
8
OpenAI API: API errors have wrong HTTP code
#57 opened 7 months ago by lucasavila00
3
How can i update to vLLM v0.4.1 for llama3 support ?
#66 opened 4 months ago by Lhemamou
6
Slow streaming
#76 opened 4 months ago by motorbike158
1
Incorrect path_or_model_id
#75 opened 4 months ago by Sapessii
13
Cannot load Tokenizers for some Models.
#63 opened 4 months ago by Mr-Nobody1
2
Got some deprecation notice, might update these
#72 opened 4 months ago by nerdylive123
1
Building Docker with model built in
#71 opened 4 months ago by KDercksen
6
Only generates 16 tokens
#74 opened 4 months ago by lawrenceztang
1
GGUF compatibility
#70 opened 5 months ago by adam-clarey
2
Best way to record data
#64 opened 5 months ago by aodhan-domhnaill
1
BadRequestError on runsync route, or what is the correct method to hit handler.py's locally run API?
#65 opened 5 months ago by dpkirchner
1
Serverless generator can not handle errors properly
#61 opened 6 months ago by dendik
0
Multi-LoRA
#60 opened 6 months ago by joaomsimoes
0
OpenAI Error: Not returning full output
#58 opened 6 months ago by Mr-Nobody1
1
weird output when using a custom model and ChatAPI does not work
#55 opened 7 months ago by Mr-Nobody1
7
Support for GPT3 based models
#54 opened 7 months ago by letajmal
1
MODEL_REVISION not read
#53 opened 7 months ago by Sapessii
2
Huggingface is down and my worker is looping
#46 opened 7 months ago by dannysemi
1
Cannot run Mixtral 8x7B Instruct AWQ
#49 opened 7 months ago by ddemillard
6
Do the new images work?
#51 opened 7 months ago by dannysemi
13
Error after tokenizer commit
#42 opened 8 months ago by StableFluffy
1
trust_remote_code not recognized
#43 opened 8 months ago by dannysemi
1
Support for mistralai/Mixtral-8x7B-Instruct-v0.1
#41 opened 8 months ago by ilkersigirci
1
enforce_eager flag
#40 opened 8 months ago by dannysemi
5
Docker image is taking too much time to build
#37 opened 8 months ago by hiennef
7
python setup.py develop did not run successfully
#32 opened 8 months ago by heraistudios
3
"n" parameter does not return multiple responses
#33 opened 8 months ago by hexadecible
1
`MAX_CONCURRENCY` parameter doesn't work
#36 opened 9 months ago by antonioglass
1