Floating point exception during conversion to C

Question

Floating point exception during conversion to C

kgeeting opened this issue 7 months ago · 4 comments

Cool library. I tried converting one of my onnx models using the tool and was thrown a floating point exception error (zsh: floating point exception) during conversion. Apple clang version 15.0.0 (clang-1500.1.0.2.5). Maybe some unhandled div by 0 somewhere?

fullModel.onnx.zip

Answer 1 · 2024-02-26T22:42:05.000Z

Thanks :)

I had a quick look at your model, and indeed, the exception is a division by zero. This stems from the first convolution layer /isi_encoder/conv1/Conv that is a 2D convolution, but the stride is given as [2]. onnx2c then iterprets this as a stride of [2,0], which of course is rather silly.

Now the onnx documentation states:

strides: int64[]
Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

This reads to me like either "pad missing dimensions with ones" or "strides must be of correct dimensions or not given at all".
Whereas somehow a "stride of 2", like in this model, does sound more like 2 in each dimensions. Was this your intention?

There might be in onnx docs mentioned some rule about splatting attributes, but I can't find it right now.

I would provisionally say this is

a malformed input onnx model
either bad documentation of onnx or a missing, clearly nice to have feature
bug in onnx2c (rather report error than crash)
missing feature in onnx2c (splat or pad the strides)

Quick fix would be to try modify the model to have the attributes more explicitly encoded :)

Btw, just for posterity - which tool (and version) generated this .onnx file?

Answer 2 · 2024-03-08T21:04:47.000Z

Sorry for the late reply. I reviewed the onnx model being fed in and your provisional guess was correct- the model was malformed, with stride discrepancies (as noted above) and different dimension sizes during a few tensor concatenations. Oddly neither of these issues were flagged when first converting the python model from PyTorch (v2.2) to onnx. But subsequent attempt to convert to C with your tool flagged them. :)

I've seen promising results when subsequently profiling the c models on an STM Nucleo board, and again just want to say, well done on the tool! I may look at the quantization alpha features you've got next. I do agree that maybe reporting the stride error (inspatialfilter.h) might be helpful in case people run into similar problems in the future. Cheers

Answer 3 · 2024-03-09T18:03:31.000Z

Thanks for the followup.

There is a lot of rules in the onnx documentation, most of which onnx2c does not check. Just because it is a lot of lines of code... But it definitely should add a few checks for this kind of thing where something as popular as pytorch creates bad input.

Re the quantization - it is really alpha level. It has only even been used to quantize that one example with the AVR (https://github.com/kraiskil/onnx2c/tree/master/examples/atmega_mnist), and probably still has some parts of that project hard coded in the sources. I was thinking actually of removing that quantization feature completely...

I would strongly recommend trying out other quantizers out there first. When I wrote that quantization thing I found nothing that works or has a reasonable learning curve. But the field moves fast, and nowdays there seem to be options.

Answer 4 · 2024-03-09T19:47:08.000Z

Added the strides check in the above commit.