microsoft/microxcaling

How to specify the MX format for Gradients in BP pass ??

Closed this issue · 2 comments

In your examples, I understand how to quantize weight and acitvation with MX format.

However, I wonder how to specify the MX format for Gradient in BP pass?

For example, in FP pass, A and W are set as FP8-E4M3 format;

while in BP pass, I need to quantize the gradient in to MX(FP8-E5M2) format,.

I have set "quantize_backprop" as True, but I don't know how to specify the gradient format option.

Should I set "w_elem_format_bp", "a_elem_format_bp_ex" or "a_elem_format_bp_os" to E5M2 ?
Or I just leave them as None?

Thanks a lot!

Tasks

Preview Give feedback
No tasks being tracked yet.

The readme has been updated. Depending on what you want to achieve you can set the flags.
To use MX(FP8-E5M2) format in backward pass, set w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os to fp8_e5m2 format.

The readme has been updated. Depending on what you want to achieve you can set the flags. To use MX(FP8-E5M2) format in backward pass, set w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os to fp8_e5m2 format.

Thanks very much.

I still can not understand which option( w_elem_format_bp, a_elem_format_bp, a_elem_format_bp_ex, a_elem_format_bp_os ) represents gradient?
In fact , in the BP process, I wish to set the gradient as E5M2, but keep activation as E4M3.
According to your suggestion, it seems that you set all activation/weight/gradient to E5M2, but this is not what I want.

So, how to separately define gradient and activation to different formats??
What is the difference between a_elem_format_bp, a_elem_format_bp_ex, and a_elem_format_bp_os??
What is the meaning of postfix bp , 'ex, os`?
Which represents gradient and which represents activation?