Should the output of `Softmax` be quantized in the Default 8-bit Registry?

Question

Should the output of `Softmax` be quantized in the Default 8-bit Registry?

Closed this issue 3 years ago · 1 comments

Describe the bug
The Default 8-bit quantization registry uses the structure _QuantizeInfo to describe how a given layer should be quantized. Where the initialization arguments are as follows:

class _QuantizeInfo(object):
  """QuantizeInfo."""
  def __init__(self,
               layer_type,
               weight_attrs,
               activation_attrs,
               quantize_output=False):

From what I can tell, the registry does not appear to quantize the output of Softmax layers:

  # TODO(tfmot): expand layers test in quantize_functional_test.py
  # to add more layers to allowlist.
  _LAYER_QUANTIZE_INFO = [
      # Activation Layers
      _QuantizeInfo(layers.ReLU, [], [], True),
      _QuantizeInfo(layers.Softmax, [], []), # Should the fourth arg not be `True` ?

Is this a bug, or a deliberate choice to not quantize the output of Softmax's? Or am I missing something, and they're being subject to simulated quantization in some other place?

Cheers,
Liam

Answer 1 · 2022-01-18T09:39:46.000Z

This is a deliberate choice since the softmax min/max is always between 0 1, it can be quantized post-training. However, if quantized during training, this usually leads to very poor results since it tends to be a binary vector. We have found it better to train with float precision and quantize afterwards if required.

If you require the softmax to be quantized during training, then you can use the quantize_annotate function with custom quantize_config to ensure it is quantized.