[one-optimize] Fuse bias with fully connected in quantized circle
Closed this issue · 1 comments
jinevening commented
What
Let's fuse bias with fully connected Op in quantized circle.
Example pattern is as follows.
Before
Input [1x64x256, Q16] -> FullyConnected [1x64x1, Q16] -> Add (w/ a constant whose shape is [1, Q16]) [1x64x1, Q16] -> Output [1x64x1, Q16]
After
Input [1x64x256, Q16] -> FullyConnected (w/ Q64 bias) [1x64x1, Q16] -> Output [1x64x1, Q16]
Note that FuseAddWithFullyConnectedPass
currently does not support fusion for quantized circle. The bias's dtype is changed from Q16 to Q64, so the bias value has to be requantized.
Why
This post-quantization optimization will be beneficial for running models quantized by external tools (ex: LLM).
jinevening commented
Done