OperandType of gemm / matmul return
Closed this issue · 4 comments
The spec says gemm
returns "an Operand" (and the same thing for matmul
).
If both arguments are tensor-quant8-asymm
, what is the OperandType
of the return? I can see use cases for tensor-int32
which is how it will actually be generated by existing hardware, tensor-quant8-asymm
for a fully quantized model, or even tensor-float32
for people that have only partly quantized their model.
This matters because the spec doesn't appear to have e.g. a requantization operator to convert int32 to int8 and anyway one would need the ability to set the scaling factor used by running the model in advance to measure an appropriate scaling factor.
Thanks for your comment. To ensure this detailed spec feedback is addressed appropriately, I've transferred the issue to the WebNN API specification repo where the API design work happens:
webmachinelearning/webnn#84
@kpu this issue has previously been discussed in webmachinelearning/webnn#44. I will be refactoring quantization-related procedural data from the OperandDescriptor
type as we incorporate aspects of quantization work into the operator API.
@wchao1115 The issue you referenced, webmachinelearning/webnn#44, is about how the quantization scaling factor and zeropoint should be included in OperandDescriptor
.
As the title of this issue says, this is about the OperandType
of the return value from matmul
. Should multiplying int8 by int8 return float32, int32, or include a scaling factor to go to int8?
This has nothing to do with how the scaling factor is encoded in OperandDescriptor
(and your suggestion that it not be).