smile-luobin

HW->MGTV->...

smile-luobin's Stars

huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Language:Python26.4k 216 4.3k5.5k
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Language:Python20.1k 142 5342k
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Language:Python19.3k 278 3k2.7k
onnx/onnx
Open standard for machine learning interoperability
Language:Python18k 437 2.9k3.7k
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++14.9k 247 6.7k2.9k
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++13.6k 197 1.5k1.7k
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python11.8k 376 3.4k3.5k
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language:C++10.9k 157 3.8k2.1k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.8k 93 2k1k
halide/Halide
a language for fast, portable data-parallel computation
Language:C++5.9k 238 2.6k1.1k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.9k 62 625894
NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Language:C++5.2k 93 1.6k621
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3.3k 154 1.3k831
onnx/onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
Language:C++3k 68 666546
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
Language:C++2.8k 42 381441
pytorch/xla
Enabling PyTorch on XLA Devices (e.g. Google TPU)
Language:C++2.5k 58 2.3k483
intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:Python2.1k 28 166211
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language:Python1.6k 39 555254
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.5k 19 137148
microcks/microcks
The open source, cloud native tool for API Mocking and Testing. Microcks is a Cloud Native Computing Foundation sandbox project 🚀
Language:Java1.4k 22 1k226
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.3k 31 1.5k168
rapidsai/raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Language:Cuda790 25 680194
RUCAIBox/LLMBox
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
Language:Python646 7 982
openxla/stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
Language:MLIR415 17 907113
intel/xFasterTransformer
Language:C++382 15 8764
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook370 5 4345
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++350 8 4738
S-Lab-System-Group/Awesome-DL-Scheduling-Papers
263 12 831
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++253 18 9540
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Language:Cuda142 3 612

smile-luobin

smile-luobin's Stars

huggingface/diffusers

microsoft/graphrag

huggingface/datasets

onnx/onnx

microsoft/onnxruntime

triton-lang/triton

apache/tvm

NVIDIA/TensorRT

NVIDIA/TensorRT-LLM

halide/Halide

NVIDIA/FasterTransformer

NVIDIA/DALI

NVIDIA/nccl

onnx/onnx-tensorrt

openxla/xla

pytorch/xla

intel/intel-extension-for-transformers

intel/intel-extension-for-pytorch

flashinfer-ai/flashinfer

microcks/microcks

NVIDIA/cccl

rapidsai/raft

RUCAIBox/LLMBox

openxla/stablehlo

intel/xFasterTransformer

LLMServe/DistServe

intel/neural-speed

S-Lab-System-Group/Awesome-DL-Scheduling-Papers

microsoft/mscclpp

leimao/CUDA-GEMM-Optimization