nvdla/doc

Length of a Stripe Operation

Opened this issue · 0 comments

The NVDLA unit description (http://nvdla.org/hw/v1/ias/unit_description.html) mentions an upper limit length of 32 for a Stripe Operation:
"The upper limit is 32 due to buffer size in the accumulator"
However, this seems to contradict the buffer size as mentioned in the "Convolution Accumulator" chapter. Let me explain why:

Every Atomic Operation results in 16 partial sums (see chapter "Atomic Operation"). So, we will have 32x16=512 Elements in total after a maximum sized Stripe Operation.
Each of these elements will be saved as an INT48 (when using INT16 in the previous steps) in the assembly SRAM group (see table 49).
This results in 512x6 Byte=3kiB.

According to the chapter "Convolution Accumulator", the buffer size is 96Bx128=12kiB
So, in theory the length of Stripe Operation could be 128 instead of 32.

Is there any reason why this is not the case or are the calculations wrong?