janhq/cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
C++Apache-2.0
Issues
- 1
feat: TensorRT-LLM load multiple models
#33 opened by tikikun - 1
feat: TensorRT-LLM Inflight batching
#29 opened by tikikun - 1
feat: Unload the model
#32 opened by tikikun - 0
feat: support llama3
#49 opened by vansangpfiev - 2
Github CI windows for tensorrt_llm engine
#28 opened by hiro-v - 2
- 1
bug: tensorRT - Switching between model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile
#27 opened by Van-QA - 1
- 1
feat: Stop inferencing
#31 opened by tikikun - 0
bug: frequency_penalty Parameter in model.yml Only Functions Correctly with Value 1, Produces Gibberish for other values
#61 opened by Van-QA - 4
bug: templating issue with mistral v0.3
#50 opened by vansangpfiev - 0
feat: Revamp the README.md file
#55 opened by irfanpena - 0
[Request] Support for logits_prob
#54 opened by hiro-v - 0
- 1
- 1
- 1
- 1
feat: Add exit method
#24 opened by tikikun - 1
feat: Check cache properly
#15 opened by tikikun - 0
- 0
epic: Add proper handler for stop words
#10 opened by tikikun