Issues
- 0
[ENHANCEMENT] Extending to DeepSpeed
#57 opened by moghadas76 - 5
[QUESTION]How is vescale zero2 implemented?
#54 opened by starstream - 4
- 2
[QUESTION] questions about Collective Communication Group Initialization Optimization in the paper
#40 opened by siddharthaOnRoad - 2
[QUESTION]How to use MQhandler for muti machines?
#56 opened by zmtttt - 3
> Using ndtimeline-tool to Monitor Megatron-GPT I want to use the ndtimeline-tool to monitor the computation and communication of each rank in Megatron-GPT. I have two concerns:
#53 opened by zmtttt - 1
- 1
[RFC] Single-Device-Abstract DDP
#52 opened by lllukehuang - 10
The times for forward-compute and backward-compute captured by the ndtimeline-tool are inaccurate
#47 opened by zmtttt - 5
[QUESTION] implementation of `get_p2p_cuda_stream_id` and `get_coll_cuda_stream_id`
#46 opened by nooblyh - 3
- 3
[QUESTION] how and where to use multi-node trace profiler in paper of megascale
#37 opened by oliverYoung2001 - 0
- 3
[QUESTION]`vescale.dtensor` vs "PyTorch DTensor"
#28 opened by GHGmc2 - 1
[QUESTION] Save checkpoint
#26 opened by Ryanuppp - 2
- 2
Code Example & Docs
#14 opened by ultranity