bliu3650's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
microsoft/DeepSpeedExamples
Example models using DeepSpeed
kuangliu/pytorch-cifar
95.47% on CIFAR10 with PyTorch
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
mindspore-ai/mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
bytedance/byteps
A high performance and generic framework for distributed DNN training
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Xilinx/Vitis-AI
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
NVIDIA/nccl-tests
NCCL Tests
pytorch/benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
microsoft/msccl
Microsoft Collective Communication Library
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
pytorch/tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
intel/handwritten-chinese-ocr-samples
End-to-end model training and deployment reference for handwritten Chinese text recognition, and can also be extended to other languages.
microsoft/msccl-tools
Synthesizer for optimal collective communication algorithms
PKU-YuanGroup/Open-Sora-Dataset
aws-neuron/aws-neuron-parallelcluster-samples