Pinned Repositories
WhisperLive
A nearly-live implementation of OpenAI's Whisper.
speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
anlp-oct24
This repository contains the code for the ANLP workshop held by Red Dragon AI on Oct 2024.
doc-qa-example
llm-agent-example
langgraph
Build resilient language agents as graphs.
LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
tensorrtllm_backend
The Triton TensorRT-LLM Backend
jasonngap1's Repositories
jasonngap1/doc-qa-example
jasonngap1/llm-agent-example
jasonngap1/anlp-oct24
This repository contains the code for the ANLP workshop held by Red Dragon AI on Oct 2024.