Clipping thresholds to input range + 1?
iksnagreb opened this issue · 4 comments
I am facing the AssertionError: Thresholds can't be expressed with type INT8
(the assertion is here in the weight file generation of Thresholding_Batch
), which I cannot really explain, at least I cannot see how I end up with a threshold of 128 at the upper bound, except for these lines in the RoundAndClipThresholds
transformation:
finn/src/finn/transformation/streamline/round_thresholds.py
Lines 59 to 66 in f2424e7
I wonder whether clipping thresholds to the range (idtype.min() - 1, idtype.max() + 1)
, under the condition that at least one threshold is outside of this range, still guarantees that these thresholds are outside of the range which can be represented with idtype
? Why is it clipped to one below/above the valid range? Is there any reasoning which I fail to see? Maybe clipping to (idtype.min(), idtype.max())
would be correct?
See this related issue: #875
Doesn't seem like the clipping issue itself was ever resolved.
Hm, the proposed step_minimize_bit_width
is already part of the flow, but it comes only after the step which is failing: It is failing right before at step_apply_folding_config
. Of course I will try to flip the order of the build steps, but this is the order of the standard data flow build. It will certainly allocate an extra bit (so it is not really minimizing but increasing the bit-width) for the thresholds tensor just to accommodate this clipping? Without a convincing argument, this does not sound right to me...
I actually faced the same issue today when trying to build the ResNet50 from the examples repository, so the clipping issue is still there and has to be worked around in the RoundAndClipThresholds transformation.
Some new insights on this one:
- I still do not think the extra +1 range makes sense, but it might have concealed some actual bugs.
- The simple fix of removing the +1 from the range solves the problem for our use cases, but it breaks the Mobilenet v1 test case, so this behavior somehow seems to be relevant (but see 1., I think there is something else going on).
- The Mobilenet v1 test breaks at
test_end2end_mobilenet_minimize_bit_width
, which incidentally is what has been proposed as the solution in #875. - The new error looks similar to the one I have initially reported, i.e., it is at some
DataType.allowed
query. - This, hints at some rounding and/or representability issue. The range +1, by effectively introducing an extra bit of precision (after "minimizing" the bit-width), probably concealed this issue in most of the cases.
So I started digging a bit and nothing seemed to make any sense at first: See here, for example, close to where the Mobilenet test fails, after removing the +1 from the range:
finn/src/finn/custom_op/fpgadataflow/thresholding.py
Lines 136 to 137 in 766e9fb
thresholds
and threshold_tensor
should contain exactly the same values and differ only in shape or by repeating these values, i.e., get_hw_compatible_threshold_tensor
can be seen as a more fancy reshape operation. But they actually differ in the values they contain, in particular in the maximum and minimum, i.e., the range of values they contain. Next, the minimized DataType gets derived from the thresholds
but this is then tested against the threshold_tensor
, causing the assertion to fire as the ranges do not align. However, I do not see anything inherently wrong with the minimize_accumulator_width
method (though it seems slightly odd to use the get_hw_compatible_threshold_tensor
here at all, but apparently this helps spotting these weird types of errors).
Next step was to figure out why these two tensors end up with different values/value ranges: First hint was the "container" data type of these tensors - normally FINN stores most parameter tensors (even the low bit-width, quantized integer ones) in float32 containers. However, somehow the thresholds initializer ended up as float64 and at https://github.com/fastmachinelearning/qonnx/blob/c5bd87f05e0b6ac6bc578d38161e7dceec1077d9/src/qonnx/util/basic.py#L135 called by the get_hw_compatible_threshold_tensor
type-casts back to float32.
Why should we care about this for supposedly integer thresholds? Well, large integers cannot be exactly represented in floating-point and the largest integer (of a given bit-width) which can be represented depends on the floating-point precision. See for example the largest signed 32-bit integer: While np.float64(2**31-1) == 2**31-1
yields True
, i.e., we can represent this one exactly in 64-bit floating-point, np.float32(2**31-1) == 2**31-1
yields False
, i.e., it cannot be represented in 32-bit floating-point. This is the reason why after get_hw_compatible_threshold_tensor
type-casts back to float32, the values and the range sometimes differ, depending on the particular values in those tensors.
And this lead me back to the RoundAndClipThresholds
transformation:
- For large integer thresholds coded as floating-points,
np.ceil
, used for rounding the thresholds, might show some unintuitive behavior:np.ceil(np.float32(16777217)) == 16777216
. There is nothing we can do about this, this is just how floating-point behaves. We just need to be aware of this fact and the potential loss of precision. - Depending on the input combination,
np.clip
might sometimes promote the output to float64, indeed clipping the values to the specified range, but increasing the precision and thus effectively increasing the range of integers which can be represented exactly. This is where the discrepancy described above comes from, and this is something we can do something about: Clearly define the type-casting/type-promotion behavior of theRoundAndClipThresholds
transformation.
For now, to not interfere with the rest of FINN and qonnx, I propose to explicitly cast the thresholds after rounding and clipping (back) to float32. It could also be an option to consider float64 or int64 (avoiding the representability issue) as a "container" data type for (large) integer thresholds, but this would require to carefully check all parts of FINN and qonnx currently assuming float32 here.
I will soon open up a PR with a reworked RoundAndClipThresholds
transformation as well as test-cases hopefully capable of validating the rounding, clipping and type-casting behavior of this transformation