Support processors without L3 cache

Question

Support processors without L3 cache

thematrixincendies opened this issue 8 years ago · 6 comments

thematrixincendies commented 8 years ago

Hi,
I am trying to use nnpack on an Intel Atom Z530 and I get the hardware not supported status when initializing. After taking a look at init.c I figured there is a requirement for L3 cache. Is it in any way possible to work around this requirement or is the deeply necessary by nnpack?

Answer 1 · 2016-11-09T20:40:54.000Z

You can mock L3 size (as is currently done on ARM), but performance wouldn't be as great.

Answer 2 · 2016-11-10T18:06:45.000Z

Thanks for the answer, could you elaborate a bit more on that? I am currently cross-compiling for the Intel Atom on a Linux host to a linux (but 32-bit) target. Therefore I added my lib paths to the cflags in configure.py and also added target specific flags. Then I am configuring with --enable-psimd and building the static lib via ninja, which is then added to my project. This all is working so far, except the above.
I think with mock you mean fill the hw_info struct myself with fixed values as it's done in the function static void init_hwinfo(void)? Is that all?

Answer 3 · 2016-11-11T22:12:01.000Z

Okay, I figured it out somehow. There seem to be some constraits though on the cache sizes, because I was getting Floating Point Exceptions when entering false cache values. I am now taking L1 and L2 values from the cpu info and mocking L3. Any experience what might be good values to keep the performance as high as possible?

Answer 4 · 2016-11-13T19:25:00.000Z

Pretending L3_size = L2_size should work well

Answer 5 · 2016-11-20T21:58:35.000Z

There are interesting performance on a different Atom with https://github.com/IntelLabs/SkimCaffe

Answer 6 · 2018-10-08T17:02:00.000Z

Shouldn't this be high priority given the prevalence of devices w/o an L3 cache in the mobile and embedded world? ;)
Likewise - would a better 'fix' (then pretending l3 = l2) be to change the outer loops to walk in increments of 1 (i.e. set output_*_block_max to 1 in case there is no L3 cache)?