Should the output of `Softmax` be quantized in the Default 8-bit Registry?
Closed this issue · 1 comments
Describe the bug
The Default 8-bit quantization registry uses the structure _QuantizeInfo
to describe how a given layer should be quantized. Where the initialization arguments are as follows:
class _QuantizeInfo(object):
"""QuantizeInfo."""
def __init__(self,
layer_type,
weight_attrs,
activation_attrs,
quantize_output=False):
From what I can tell, the registry does not appear to quantize the output of Softmax
layers:
# TODO(tfmot): expand layers test in quantize_functional_test.py
# to add more layers to allowlist.
_LAYER_QUANTIZE_INFO = [
# Activation Layers
_QuantizeInfo(layers.ReLU, [], [], True),
_QuantizeInfo(layers.Softmax, [], []), # Should the fourth arg not be `True` ?
Is this a bug, or a deliberate choice to not quantize the output of Softmax
's? Or am I missing something, and they're being subject to simulated quantization in some other place?
Cheers,
Liam
This is a deliberate choice since the softmax min/max is always between 0 1, it can be quantized post-training. However, if quantized during training, this usually leads to very poor results since it tends to be a binary vector. We have found it better to train with float precision and quantize afterwards if required.
If you require the softmax to be quantized during training, then you can use the quantize_annotate
function with custom quantize_config
to ensure it is quantized.