MonadKai's Stars
baiguoname/qust
SciPhi-AI/R2R
Containerized, state of the art Retrieval-Augmented Generation (RAG) system with a RESTful API
explodinggradients/ragas
Supercharge Your LLM Application Evaluations 🚀
janhq/ichigo
Local realtime voice AI
sgl-project/tensorrt-demo
TensorRT LLM Benchmark Configuration
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
usefulsensors/moonshine
Fast and accurate automatic speech recognition (ASR) for edge devices
facebookresearch/ReAgent
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
backprop-ai/vllm-benchmark
Benchmarking the serving capabilities of vLLM
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
DanielJDufour/language-detector
Detect the language of text
kamalkraj/stable-diffusion-tritonserver
Deploy stable diffusion model with onnx/tenorrt + tritonserver
graspologic-org/graspologic-native
graspologic-native is a library of rust components to add additional capability to graspologic a python library for intelligently building networks and network embeddings, and for analyzing connected data.
graspologic-org/graspologic
Python package for graph statistics
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
HKUDS/LightRAG
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
ZJU-ACES-ISE/chatunitest-maven-plugin
microsoft/Tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
openai/swarm
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
triton-inference-server/onnxruntime_backend
The Triton backend for the ONNX Runtime.
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
shreyansh26/Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
Lightning-AI/LitServe
Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
mani-kantap/llm-inference-solutions
A collection of all available inference solutions for the LLMs
mobiusml/gemlite
Simple and fast low-bit matmul kernels in CUDA / Triton
allenai/wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Cambricon/mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
fw-ai/benchmark
Benchmark suite for LLMs from Fireworks.ai
Deep-Learning-Profiling-Tools/triton-viz
triton-lang/kernels