How to specify the MX format for Gradients in BP pass ??
Closed this issue · 2 comments
In your examples, I understand how to quantize weight and acitvation with MX format.
However, I wonder how to specify the MX format for Gradient in BP pass?
For example, in FP pass, A and W are set as FP8-E4M3 format;
while in BP pass, I need to quantize the gradient in to MX(FP8-E5M2) format,.
I have set "quantize_backprop" as True, but I don't know how to specify the gradient format option.
Should I set "w_elem_format_bp", "a_elem_format_bp_ex" or "a_elem_format_bp_os" to E5M2 ?
Or I just leave them as None?
Thanks a lot!
The readme has been updated. Depending on what you want to achieve you can set the flags.
To use MX(FP8-E5M2) format in backward pass, set w_elem_format_bp
, a_elem_format_bp
, a_elem_format_bp_ex
, a_elem_format_bp_os
to fp8_e5m2 format.
The readme has been updated. Depending on what you want to achieve you can set the flags. To use MX(FP8-E5M2) format in backward pass, set
w_elem_format_bp
,a_elem_format_bp
,a_elem_format_bp_ex
,a_elem_format_bp_os
to fp8_e5m2 format.
Thanks very much.
I still can not understand which option( w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os ) represents gradient?
In fact , in the BP process, I wish to set the gradient as E5M2, but keep activation as E4M3.
According to your suggestion, it seems that you set all activation/weight/gradient to E5M2, but this is not what I want.
So, how to separately define gradient and activation to different formats??
What is the difference between a_elem_format_bp
, a_elem_format_bp_ex
, and a_elem_format_bp_os
??
What is the meaning of postfix bp
, 'ex,
os`?
Which represents gradient and which represents activation?