Urgent unknown problem needs help!!!
YZW-explorer opened this issue · 2 comments
I want to implement the deployment of 4bit quantization on the PYNQ board. I saw that the MobileNet-v1 example in the example is quantified to 4bit, but I saw it on the FINN official website (https://finn.readthedocs.io/en/latest /source_code/finn.core.html)
"Enum class that contains FINN data types to set the quantization annotation. ONNX does not support data types smaller than 8-bit integers, whereas in FINN we are interested in smaller integers down to ternary and bipolar." Why is this? How to solve this situation.
Hi @YZW-explorer, good question!
This is a limitation to ONNX for our purposes and exactly one of the reasons why we are using QONNX. By means of FINN's datatype annotation, we are able to go below 8-bit quantization, so you should be fine to experiment with 4-bit quantized models.
For a simple example on how to export an MLP with 2-bit weights and activations, you could have a look here.