lixiao2010's Stars
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
volcengine/veScale
A PyTorch Native LLM Training Framework
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
lumiere-ml/Awesome-LLM-Application
A curated list about how to build a LLM application including Input augment, model augment, RAG-system, serving, evaluation and software UI
intelligent-machine-learning/glake
GLake: optimizing GPU memory management and IO transmission.
bigscience-workshop/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
shcho1118/flash-attention
Fast and memory-efficient exact attention
catie-aq/flash-attention
Fast and memory-efficient exact attention
Lightning-AI/forked-pdb
Python pdb for multiple processes
gururise/AlpacaDataCleaned
Alpaca dataset from Stanford, cleaned and curated
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
git-cloner/aliendao
huggingface mirror download
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
treadon/llama-7b-example
An example to run LLaMa-7B on Windows CPU or GPU
NVIDIA-AI-IOT/Lidar_AI_Solution
A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
microsoft/DeepSpeedExamples
Example models using DeepSpeed
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
ROCm/rocRAND
RAND library for HIP programming language
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
wangkaisine/onnxruntime-inference-examples-cxx-for-linux
ONNX Runtime C++ sample code that can run in Linux
Hongqing-work/cudnn-learning-framework
A tiny learning framework built by cudnn and cublas.
google/flax
Flax is a neural network library for JAX that is designed for flexibility.
matthias-wright/flaxmodels
Pretrained deep learning models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet, etc.
google-deepmind/dm-haiku
JAX-based neural network library
google/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
n2cholas/jax-resnet
Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).
leimao/ONNX-Runtime-Inference
ONNX Runtime Inference C++ Example
Oneflow-Inc/DLPerf
DeepLearning Framework Performance Profiling Toolkit
facebookresearch/dlrm
An implementation of a deep learning recommendation model (DLRM)
kakaobrain/torchlars
A LARS implementation in PyTorch