lixiao2010

lixiao2010's Stars

sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
Language:Python25613
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python58128
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python65293
lumiere-ml/Awesome-LLM-Application
A curated list about how to build a LLM application including Input augment, model augment, RAG-system, serving, evaluation and software UI
6
intelligent-machine-learning/glake
GLake: optimizing GPU memory management and IO transmission.
Language:Python35132
bigscience-workshop/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python1.3k213
shcho1118/flash-attention
Fast and memory-efficient exact attention
Language:Python42
catie-aq/flash-attention
Fast and memory-efficient exact attention
Language:Python4
Lightning-AI/forked-pdb
Python pdb for multiple processes
Language:Python306
gururise/AlpacaDataCleaned
Alpaca dataset from Stanford, cleaned and curated
Language:Python1.5k146
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
Language:Python6.3k439
git-cloner/aliendao
huggingface mirror download
Language:Python54755
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.4k1.2k
treadon/llama-7b-example
An example to run LLaMa-7B on Windows CPU or GPU
Language:Python366
NVIDIA-AI-IOT/Lidar_AI_Solution
A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Language:Python1.3k220
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6k1k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.8k882
ROCm/rocRAND
RAND library for HIP programming language
Language:C++11168
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.4k906
wangkaisine/onnxruntime-inference-examples-cxx-for-linux
ONNX Runtime C++ sample code that can run in Linux
Language:C++84
Hongqing-work/cudnn-learning-framework
A tiny learning framework built by cudnn and cublas.
Language:Cuda213
google/flax
Flax is a neural network library for JAX that is designed for flexibility.
Language:Python6k631
matthias-wright/flaxmodels
Pretrained deep learning models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet, etc.
Language:Python23425
google-deepmind/dm-haiku
JAX-based neural network library
Language:Python2.9k232
google/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Language:Python30k2.7k
n2cholas/jax-resnet
Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).
Language:Python1038
leimao/ONNX-Runtime-Inference
ONNX Runtime Inference C++ Example
Language:C++21854
Oneflow-Inc/DLPerf
DeepLearning Framework Performance Profiling Toolkit
Language:Python27627
facebookresearch/dlrm
An implementation of a deep learning recommendation model (DLRM)
Language:Python3.7k825
kakaobrain/torchlars
A LARS implementation in PyTorch
Language:Python33228