NVIDIA/TransformerEngine

Some doubts about the usage of `DelayedScaling.interval`.

wzzju opened this issue · 1 comments

Is the interval attribute of DelayedScaling not used in PyTorch within the current TransformerEngine? In other words, does the value of DelayedScaling.interval affect the computation frequency of the scaling factor in PyTorch? I have carefully reviewed the source code of TransformerEngine and didn't find any usage of DelayedScaling.interval in PyTorch to control the computation frequency of the scaling factor.

Hi @wzzju, thanks for reporting the bug and sorry for the delayed response! The interval argument in the recipe is indeed unused and has been removed in #892. During the initial days of developing an FP8 recipe, interval was something we experimented with but it never ended up using. It accidentally made it into the TE release and has incorrectly remained there since.