google/ruy

Broken computation when running under nodejs on armv7

lissyx opened this issue · 5 comments

Fuller investigation has been documented on tensorflow/tensorflow#39509.

It looks like q7 was removed from the clearage in tensorflow/tensorflow@2359c4e#diff-ca44636122d5fd4fe9600903ebf461b9L665.

I honestly don't know why it would expose this behavior only under NodeJS and only on ARMv7 platform, but re-instating q7 as in tensorflow/tensorflow#39951 fixes the issue.

Since q7 is cleared at other places q6-q15 are cleared, and since there's no specific comment regarding the removal of q7 at this place, is it possible it's just a slight typo and I have been lucky in finding it?

Thanks for the fix. I'm curious. You are running nodejs on arm32? Is your hardware supporting only arm32 and not arm64 code? You will get higher performance with arm64 code, especially for the code path that you're fixing here, which is the quantized 8bit path, when running arm64 code ruy is able to take advantage of new ARM dot-product instructions, a 4x speedup. ruy takes care automatically of detecting whether these instructions are available and use them when appropriate.

I'm not targeting any specific hw, I was looking into leveraging threads for running on some Cortex-A53 SoC like Rpi3, as well as QM215. NodeJS is just one of our bindings for deepspeech, and I ran into the issue when pushing to our CI 😊

OK. Just a generic note that ruy will benefit from running arm64 code. In particular, Cortex-A53 is arm64-capable. QM215 is A53-based too.

OK. Just a generic note that ruy will benefit from running arm64 code. In particular, Cortex-A53 is arm64-capable. QM215 is A53-based too.

Indeed but on those two examples, raspbian is running in 32 bits and Android 10 Go as well, but thanks for your advice.

i see, thanks for the explanation. knowing this helps prioritize arm32 in future ruy work.