Issues
- 0
vLLM run script prefill + decode trace pre-capture to avoid TTFT on first completions being unexpectedly high or stalling
#56 opened by tstescoTT - 0
- 1
Provide example chat template usage
#36 opened by tstescoTT - 2
Add handling for RAG context
#9 opened by tstescoTT - 1
Missing `--max_prompt_length` argument running example_requests_client_alpaca_eval.py
#51 opened by milank94 - 1
Very slow speed
#40 opened by changh95 - 8
- 1
Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle.
#17 opened by tstescoTT - 3
Simple E2E quick-start.sh
#10 opened by tt-mjudge - 1
Add linting / formatting checks on PRs
#7 opened by tstescoTT - 0
- 0
- 2
- 0
- 0
- 0
- 1
Support for Llama 3.1 8B
#11 opened by tt-mjudge - 3
- 2
Llama 3.1 70B T3K inference server batch_top_pk_logits_efficient worst case ~20ms latency
#5 opened by tstescoTT - 2