bliu3650

bliu3650's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.9k 1.8k 48.1k23.1k
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python36.2k 349 2.9k4.2k
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python29.7k 344 2714.1k
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python17k 111 1.1k1.7k
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Language:Python14.3k 334 2.2k2.2k
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.8k 154 3681k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python11.1k 167 8202.5k
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
Language:Jupyter Notebook10.2k 84 250825
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6.2k 75 5461.1k
kuangliu/pytorch-cifar
95.47% on CIFAR10 with PyTorch
Language:Python6.1k 95 1362.2k
yangjianxin1/Firefly
Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Language:Python6k 54 281537
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++6k 63 625895
mindspore-ai/mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Language:C++4.4k 148 281717
bytedance/byteps
A high performance and generic framework for distributed DNN training
Language:Python3.7k 84 267493
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3.4k 152 1.4k847
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
Language:Python2.8k 37 138176
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python2k 25 185345
Xilinx/Vitis-AI
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
Language:Python1.5k 78 1.4k640
NVIDIA/nccl-tests
NCCL Tests
Language:Cuda964 26 240255
pytorch/benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
Language:Python895 253 912294
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Language:HTML755 26 218171
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Language:Python753 17 8858
microsoft/msccl
Microsoft Collective Communication Library
Language:C++330 12 2830
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
Language:Python320 18 5946
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++270 8 2624
pytorch/tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
Language:C++252 57 9675
intel/handwritten-chinese-ocr-samples
End-to-end model training and deployment reference for handwritten Chinese text recognition, and can also be extended to other languages.
Language:Python151 5 1532
microsoft/msccl-tools
Synthesizer for optimal collective communication algorithms
Language:Python101 9 2125
PKU-YuanGroup/Open-Sora-Dataset
Language:Python98 8 66
aws-neuron/aws-neuron-parallelcluster-samples
Language:Shell22 28 86