runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
PythonMIT
Issues
- 7
- 23
- 2
Meta-Llama-3.1-8B support
#97 opened by klipach - 1
0.5.5 is out
#105 opened by the-xentropy - 0
Correct way to implement RAG with vllm
#103 opened by Hel1zor - 0
Documentation incorrect regarding boolean
#101 opened by scriptcoded - 0
MODEL_REVISION & TOKENIZER_REVISION: Both are needed to configure the revision
#100 opened by TimPietrusky - 0
Bitsandbytes support
#99 opened by ilyalasy - 5
trust_remote_code Setting Not Applied in runpod/worker-v1-vllm:stable-cuda12.1.0
#91 opened by Juhong-Namgung - 0
Support GGUF models
#98 opened by vladfaust - 13
- 2
ValueError: rope_scaling must be a dictionary with two fields, type and factor
#89 opened by omar93939 - 0
Update to vllm 0.5
#80 opened by Sapessii - 0
Using mistral 0.3
#79 opened by Sapessii - 5
- 1
[feat] ability to set max_num_seqs
#87 opened by kalocide - 1
A new version of VLLM has been released
#84 opened by d4rk6un - 1
- 0
- 2
OOM on second request
#78 opened by Permafacture - 2
ImportError prepare_hf_model_weights method
#73 opened by ArtyoMKos - 8
- 3
OpenAI API: API errors have wrong HTTP code
#57 opened by lucasavila00 - 6
- 1
Slow streaming
#76 opened by motorbike158 - 13
Incorrect path_or_model_id
#75 opened by Sapessii - 2
Cannot load Tokenizers for some Models.
#63 opened by Mr-Nobody1 - 1
- 6
Building Docker with model built in
#71 opened by KDercksen - 1
Only generates 16 tokens
#74 opened by lawrenceztang - 2
GGUF compatibility
#70 opened by adam-clarey - 1
Best way to record data
#64 opened by aodhan-domhnaill - 1
BadRequestError on runsync route, or what is the correct method to hit handler.py's locally run API?
#65 opened by dpkirchner - 0
- 0
Multi-LoRA
#60 opened by joaomsimoes - 1
OpenAI Error: Not returning full output
#58 opened by Mr-Nobody1 - 7
- 1
Support for GPT3 based models
#54 opened by letajmal - 2
MODEL_REVISION not read
#53 opened by Sapessii - 1
Huggingface is down and my worker is looping
#46 opened by dannysemi - 6
Cannot run Mixtral 8x7B Instruct AWQ
#49 opened by ddemillard - 13
Do the new images work?
#51 opened by dannysemi - 1
Error after tokenizer commit
#42 opened by StableFluffy - 1
trust_remote_code not recognized
#43 opened by dannysemi - 1
- 5
enforce_eager flag
#40 opened by dannysemi - 7
Docker image is taking too much time to build
#37 opened by hiennef - 3
- 1
- 1
`MAX_CONCURRENCY` parameter doesn't work
#36 opened by antonioglass