msr-fiddle/dnn-partitioning

latency vs maxload

Closed this issue · 15 comments

hello, please can I ask you what a difference of latency and maxload?

in my understanding with papers notes maxload is the time of maximum loaded device in inference/training process.
from you paper - latency is the time to produce the final output.
but from this definitions isn't it maxload = latency?

thanks

Let's say you have a chain of 100 devices, each takes 10 ms. Then maxload = 10 ms, latency = 1000 ms. The latency objective in the paper is for a single-sample scenario, whereas the throughput objective is for pipelined inference or training.

for example, in Table 1 with Operator-granularity graphs, pipelined inference, let's see row1 with bert-l3 and IP (contiguous)
so, maxload is 27.92 which is max loaded device, this 27.92 is time in seconds or ms?
runtime 1s - is inference time of single sample (for example, image) of all devices, right?

27.92 is in ms. This is the max-load (time taken per sample, or the inverse throughput).
The 1 second is the runtime of the optimization algorithm, not of inference/training.

latency in ms as well?
can we see somewhere runtime of inference in this paper?

Yes.
Latency you mean? Table 4

latency in table 4 is same as inference time of classic understanding, right?

I don't know what "classic understanding" is. It single-sample latency.
I need to go now; if you ask more questions, I will try to answer them in a few days.

just to be clear
u said example with a chain of 100 devices, each takes 10 ms. Then maxload = 10 ms, latency = 1000 ms.
but this in unparallel chain, right?
because if we will parallel 2 devices of this 100 with each 10 ms, then maxload = 10 ms, and latency = 990 ms, right?

what batch size did you use in resnet50 inference (for operator graphs) ? and can you talk what type of FPGA did you use to profiling each node fpga latency time (it's important to compare results)?
i have 5452 ms time latency time on 1 cpu and 1 fpga which is so much
because i tested 16 bs 3x256x256 image on cpu only (1 core) in my environment and this gives me 5 seconds of inference time

and how you define color classes of nodes?

upd: changed maxSizePerFPGA to maxSizePerFPGA": 7549747200.0 now i have 325 ms latency for resnet50. but question what is batch size? i can't find this value in paper/code

i have a problem when i testing resnet50 for operator graphs
i put MaxSizePerFPGA to 16 Gb
and my Latency always is 325 ms (with 1, 2, 3, 4.. max FPGAs) and i don't know why this always same value. by the logic it should be going down

latency when using only CPUs is 6666 which for resnet50, so this is 6.6 sec
i guess you doing experiments with inference on 16 bs? cause if bs is 1, this results are little bit strange

I'm sorry, we cannot publish details on operator graphs or FPGAs. Of course, all comparisons in the paper are fair (apple-to-apple).
(For layer graphs, the input files are converted from profiles from the PipeDream paper, so you could look there if you care about layer graphs.)
Overall, don't read too much into what is in the input files. They are there so that there is something to test the algorithm on. The algorithm is the true contribution.

okay, np, i wanted this information to compare some results. cause inference time for example at Xeon and usual i7 are really different times. same issue with batch size.