janhq/cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

C++Apache-2.0

Issues

feat: TensorRT-LLM load multiple models
#33 opened 9 months ago by tikikun
1
feat: TensorRT-LLM Inflight batching
#29 opened 9 months ago by tikikun
1
feat: Unload the model
#32 opened 9 months ago by tikikun
1
feat: support llama3
#49 opened 5 months ago by vansangpfiev
0
Github CI windows for tensorrt_llm engine
#28 opened 5 months ago by hiro-v
2
feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption
#25 opened 5 months ago by hiro-v
2
bug: tensorRT - Switching between model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile
#27 opened 9 months ago by Van-QA
1
feat: Enable the usage of InferenceRequest and stop_words_list
#30 opened 9 months ago by tikikun
1
feat: Stop inferencing
#31 opened 9 months ago by tikikun
1
bug: frequency_penalty Parameter in model.yml Only Functions Correctly with Value 1, Produces Gibberish for other values
#61 opened 5 months ago by Van-QA
0
bug: templating issue with mistral v0.3
#50 opened 5 months ago by vansangpfiev
4
feat: Revamp the README.md file
#55 opened 6 months ago by irfanpena
0
[Request] Support for logits_prob
#54 opened 6 months ago by hiro-v
0
feat: use batch-manager instead of gpt-runtime
#51 opened 6 months ago by vansangpfiev
0
feat: build engine for TinyLlama 1.1 (windows only)
#11 opened 9 months ago by 0xSage
1
feat: build engine for Mistral 7b Q4 (windows only)
#12 opened 9 months ago by 0xSage
1
Friction Report: Using TensorRT-LLM on Windows
#26 opened 9 months ago by dan-homebrew
1
feat: Add exit method
#24 opened 9 months ago by tikikun
1
feat: Check cache properly
#15 opened 9 months ago by tikikun
1
feat: pin release on Github and possibly push to Cloudflare
#13 opened 10 months ago by 0xSage
0
epic: Add proper handler for stop words
#10 opened 10 months ago by tikikun
0