Maratyszcza/NNPACK

Support processors without L3 cache

thematrixincendies opened this issue · 6 comments

Hi,
I am trying to use nnpack on an Intel Atom Z530 and I get the hardware not supported status when initializing. After taking a look at init.c I figured there is a requirement for L3 cache. Is it in any way possible to work around this requirement or is the deeply necessary by nnpack?

You can mock L3 size (as is currently done on ARM), but performance wouldn't be as great.

Thanks for the answer, could you elaborate a bit more on that? I am currently cross-compiling for the Intel Atom on a Linux host to a linux (but 32-bit) target. Therefore I added my lib paths to the cflags in configure.py and also added target specific flags. Then I am configuring with --enable-psimd and building the static lib via ninja, which is then added to my project. This all is working so far, except the above.
I think with mock you mean fill the hw_info struct myself with fixed values as it's done in the function static void init_hwinfo(void)? Is that all?

Okay, I figured it out somehow. There seem to be some constraits though on the cache sizes, because I was getting Floating Point Exceptions when entering false cache values. I am now taking L1 and L2 values from the cpu info and mocking L3. Any experience what might be good values to keep the performance as high as possible?

Pretending L3_size = L2_size should work well

bhack commented

There are interesting performance on a different Atom with https://github.com/IntelLabs/SkimCaffe

psiha commented

Shouldn't this be high priority given the prevalence of devices w/o an L3 cache in the mobile and embedded world? ;)
Likewise - would a better 'fix' (then pretending l3 = l2) be to change the outer loops to walk in increments of 1 (i.e. set output_*_block_max to 1 in case there is no L3 cache)?