larq/compute-engine

Warn users when converting models which can't be optimally accelerated

AdamHillier opened this issue · 1 comments

We should work out a good way to raise warnings for models which won't convert in an 'optimal' way -- essentially any eggregious violation of our model optimisation guide. It's slightly complicated for a few reasons:

  • We need to decide where to raise warnings (in Larq, in the LCE converter)?
  • Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.
  • There are different degrees of 'sub-optimal'. In the particular example you raise, when using the LCE optimized kernels on Aarch64, not being able to write bitpacked activations doesn't have a huge measurable latency impact (< 10%).
  • We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).

Originally posted by @AdamHillier in #541 (comment)

* We need to decide where to raise warnings (in Larq, in the LCE converter)?

My two cents: I would try to decouple Larq from any conversion-related logic, including when to warn. (TensorFlow doesn't give you a warning either when you use an operator that cannot be efficiently converted by TFLite.)

* Many of the warnings we could raise are target-dependent, but the flatbuffer files produced by the LCE converter are target-independent.

How about a target argument to the converter, where one could specify "reference" or "Aarch64" (or others later), with the latter being the default (if desired) for compatibility? To make the flatbuffer target dependent, a metadata field with name target could be used. Then the runtime could check the intended target at initialization and abort/warn.

* We don't want to spam users with lots of warnings about sub-optimal code, especially as there are legitimate use-cases where a user doesn't care if something will run slowly (for example, the user might be doing research into new BNN architecture designs and using patterns which aren't currently optimised in LCE, such as grouped convolutions, but could be implemented in the future).

I think being able to specify a target would already help with this, but depending on how the warnings are logged, you could push to responsibility on the user to control what's logged:

  • explicit verbosity settings passed to convert (I generally don't like this approach)
  • if you can use logging (and even in some other cases), the user has the ability to fully control levels and handlers, and you could provide helpers like this:
with lce.converter_log_level('q'):
    lce.convert_keras_model(
        model_with_unsupported_op,
        target='reference',
    )