latency vs maxload

Question

latency vs maxload

Closed this issue 4 years ago · 15 comments

hello, please can I ask you what a difference of latency and maxload?

in my understanding with papers notes maxload is the time of maximum loaded device in inference/training process.
from you paper - latency is the time to produce the final output.
but from this definitions isn't it maxload = latency?

thanks

dj3500 commented 4 years ago

Yes.

Answer 1 · 2021-03-05T10:02:27.000Z

Let's say you have a chain of 100 devices, each takes 10 ms. Then maxload = 10 ms, latency = 1000 ms. The latency objective in the paper is for a single-sample scenario, whereas the throughput objective is for pipelined inference or training.

Answer 2 · 2021-03-05T10:13:19.000Z

for example, in Table 1 with Operator-granularity graphs, pipelined inference, let's see row1 with bert-l3 and IP (contiguous)
so, maxload is 27.92 which is max loaded device, this 27.92 is time in seconds or ms?
runtime 1s - is inference time of single sample (for example, image) of all devices, right?

Answer 3 · 2021-03-05T10:16:40.000Z

27.92 is in ms. This is the max-load (time taken per sample, or the inverse throughput).
The 1 second is the runtime of the optimization algorithm, not of inference/training.

Answer 4 · 2021-03-05T10:19:22.000Z

latency in ms as well?
can we see somewhere runtime of inference in this paper?

Answer 5 · 2021-03-05T10:20:55.000Z

Yes.
Latency you mean? Table 4

Answer 6 · 2021-03-05T10:24:13.000Z

latency in table 4 is same as inference time of classic understanding, right?

Answer 7 · 2021-03-05T10:28:05.000Z

I don't know what "classic understanding" is. It single-sample latency.
I need to go now; if you ask more questions, I will try to answer them in a few days.

Answer 8 · 2021-03-08T10:57:21.000Z

just to be clear
u said example with a chain of 100 devices, each takes 10 ms. Then maxload = 10 ms, latency = 1000 ms.
but this in unparallel chain, right?
because if we will parallel 2 devices of this 100 with each 10 ms, then maxload = 10 ms, and latency = 990 ms, right?

Answer 9 · 2021-03-09T07:05:50.000Z

what batch size did you use in resnet50 inference (for operator graphs) ? and can you talk what type of FPGA did you use to profiling each node fpga latency time (it's important to compare results)?
i have 5452 ms time latency time on 1 cpu and 1 fpga which is so much
because i tested 16 bs 3x256x256 image on cpu only (1 core) in my environment and this gives me 5 seconds of inference time

and how you define color classes of nodes?

Answer 10 · 2021-03-09T09:50:36.000Z

upd: changed maxSizePerFPGA to maxSizePerFPGA": 7549747200.0 now i have 325 ms latency for resnet50. but question what is batch size? i can't find this value in paper/code

Answer 11 · 2021-03-09T14:50:00.000Z

i have a problem when i testing resnet50 for operator graphs
i put MaxSizePerFPGA to 16 Gb
and my Latency always is 325 ms (with 1, 2, 3, 4.. max FPGAs) and i don't know why this always same value. by the logic it should be going down

Answer 12 · 2021-03-10T06:56:55.000Z

latency when using only CPUs is 6666 which for resnet50, so this is 6.6 sec
i guess you doing experiments with inference on 16 bs? cause if bs is 1, this results are little bit strange

Answer 13 · 2021-03-15T09:25:21.000Z

I'm sorry, we cannot publish details on operator graphs or FPGAs. Of course, all comparisons in the paper are fair (apple-to-apple).
(For layer graphs, the input files are converted from profiles from the PipeDream paper, so you could look there if you care about layer graphs.)
Overall, don't read too much into what is in the input files. They are there so that there is something to test the algorithm on. The algorithm is the true contribution.

Answer 14 · 2021-03-15T10:58:40.000Z

okay, np, i wanted this information to compare some results. cause inference time for example at Xeon and usual i7 are really different times. same issue with batch size.