smile-luobin's Stars
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
onnx/onnx
Open standard for machine learning interoperability
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
triton-lang/triton
Development repository for the Triton language and compiler
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
halide/Halide
a language for fast, portable data-parallel computation
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
onnx/onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
pytorch/xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
microcks/microcks
The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computing Foundation sandbox project 🚀
NVIDIA/cccl
CUDA Core Compute Libraries
rapidsai/raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
RUCAIBox/LLMBox
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
openxla/stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
intel/xFasterTransformer
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
S-Lab-System-Group/Awesome-DL-Scheduling-Papers
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization