[one-optimize] Fuse bias with fully connected in quantized circle

Question

[one-optimize] Fuse bias with fully connected in quantized circle

Closed this issue 2 months ago · 1 comments

What

Let's fuse bias with fully connected Op in quantized circle.

Example pattern is as follows.

Before

Input [1x64x256, Q16] -> FullyConnected [1x64x1, Q16] -> Add (w/ a constant whose shape is [1, Q16]) [1x64x1, Q16] -> Output [1x64x1, Q16]

After

Input [1x64x256, Q16] -> FullyConnected (w/ Q64 bias) [1x64x1, Q16]  -> Output [1x64x1, Q16]

Note that FuseAddWithFullyConnectedPass currently does not support fusion for quantized circle. The bias's dtype is changed from Q16 to Q64, so the bias value has to be requantized.

Why

This post-quantization optimization will be beneficial for running models quantized by external tools (ex: LLM).

jinevening commented 2 months ago

Done