pytorch/multipy

deploy/runtime: use a background thread to run GC when interpreters aren't executing the forward pass

d4l3k opened this issue · 1 comments

d4l3k commented

To optimize the forward pass latency it would be good to time GC to run in between model executions. This won't improve the QPS since the GC cost is the same amoratized but it would make the latency lower per batch.

import gc

gc.collect()

We should spin up a background thread that periodically iterates over all of the interpreter threads -- locks them between execution and runs the GC. It might also be worth it to explicitly disable GC on the individual interpreter threads so they won't run during the forward pass.

Context:

https://fb.workplace.com/notes/538119557964077/

FYI, we actually call gc.freeze() after loading the inference model in our online system to reduce GC latency.