wangbluo

Previous worked in ByteDance, Temu. Master of NUS, bachelor of SYSU. Focus on parallel training in LLMs.

colossalaiSingapore

Pinned Repositories

ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.9k 384 1.7k4.3k
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.2k 1.7k 47.8k22.9k
BandWidth_Test
Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPU (2, 4, 8 cards) and multi-node multi-GPU (16 cards) setups, using only PyTorch and Python built-in packages.
Language:Python1 2 00
ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python1 0 00
Finetune_llama2
Build a llama fine-tuning script from scratch using PyTorch and transformers API. It needs to support 4 optional features: gradient checkpointing, mixed precision, data parallelism, tensor parallelism. Do not use ColossalAI/Megatron/DeepSpeed frameworks, you can refer to the code.
Language:Python2 2 00
Finetune_llama2_Megatron
Using megatron style to do TP training.
Language:Python2 2 00
Pytorch-profile
Use pytorch profile api to further analysis the training detailed information, like heaps and stacks, time consuming.
Language:Python00

wangbluo/Finetune_llama2
Build a llama fine-tuning script from scratch using PyTorch and transformers API. It needs to support 4 optional features: gradient checkpointing, mixed precision, data parallelism, tensor parallelism. Do not use ColossalAI/Megatron/DeepSpeed frameworks, you can refer to the code.
Language:Python2 2 00
wangbluo/Finetune_llama2_Megatron
Using megatron style to do TP training.
Language:Python2 2 00
wangbluo/BandWidth_Test
Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPU (2, 4, 8 cards) and multi-node multi-GPU (16 cards) setups, using only PyTorch and Python built-in packages.
Language:Python1 2 00
wangbluo/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python1 0 00
wangbluo/Pytorch-profile
Use pytorch profile api to further analysis the training detailed information, like heaps and stacks, time consuming.
Language:Python00