NEON architecture for AArch64 uses 32 × 128-bit register, twice as many as for ARMv7
Closed this issue · 1 comments
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CEGDJGGC.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJHECGIH.html
The NEON architecture for AArch64 uses 32 × 128-bit register, twice as many as for ARMv7.
If I right understand, then it can improve performance for aarch64?
If yes, I will be grateful for implementation, I think it take for me long time.
I ready to test.
Hi,
had a look into this. Our current code for Neon is not on machine level (assembler) but in C. This means that we don't specify which register to use. The register allocation is done by the compiler. If the compiler is aware of the extra registers, it can use it - no code changes necessary. Indeed this might help to boost performance, as there is less need for cache.
Best,
Johnny