Is CUDAGraph generation available during PPO training time?
Closed this issue · 5 comments
Thanks for the great work! I saw that you support CUDA graph generation for fast inference. Is this effective during the PPO training time? Because PPO generation is really costly. or is there any fast generation method used in RealHF for training?
Yes. We didn't perform an ablation, but it can lead to about 4x accelearation in our use cases (2048 prompt + 2048 gen).
We get results in our paper without CUDA graph. We will update the paper recently.
Yes. We didn't perform an ablation, but it can lead to about 4x accelearation in our use cases (2048 prompt + 2048 gen).
We get results in our paper without CUDA graph. We will update the paper recently.
That's great! Is this CUDA graph auto applied now or I need to enable it use certain args?
And you should use a proper docker image version according to your hardware. We've tested 24.03 for A100 and 24.07 for H100. Other combinations are not ensured to work well thanks to Nvidia.
Thanks for all the helpful information!