Collect some useful code.
OpenResume is a powerful open-source resume builder and resume parser. https://open-resume.com/
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Causal depthwise conv1d in CUDA, with a PyTorch interface
study of cutlassLanguage:Cuda
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
Virtual whiteboard for sketching hand-drawn like diagrams
Performance of the C++ interface of flash attention, flash attention v2 and self decoding attention in large language model (LLM) inference scenarios.
a lightweight LLM model inference framework
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Best practice for training LLaMA models in Megatron-LM
MSCCL++: A GPU-driven communication stack for scalable AI applications
nanobind: tiny and efficient C++/Python bindings
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AICG system etc.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
A high-throughput and memory-efficient inference and serving engine for LLMs