NVIDIA/tensorrt-laboratory

03-batching

mrmeswani opened this issue · 1 comments

I am trying to learn how to use batching on tesnorrt RN50 engine. I was looking at the example 03-batching and I found that perhaps there is a missing file https://github.com/NVIDIA/yais/blob/master/examples/03_Batching/streaming-service.cc that is used to implement the inference on the batches?

I already have an engine for RN50 following example 01, and would like to see how I can use your batching example to do inferences in batches.

@mrmeswani - thanks for your interest.

The batching example is a proof of concept that implements the 01_GRPC example. This example implements a batched Echo service.

I've just pushed a new commit to master (dfd5545) which begins to address the issue. This commit will fix the link above.

I think there will be some larger changes needed for batched inference. In particular, I think we need to expose a ContextReset virtual function that can be used to reset member variables of derived Context.

For batched inference, I predict we will need to launch an async preprocessing lambda per request in OnRequestReceived. We will have to track those futures and sync on them prior to launching the inference in ExecuteRPC.

void OnRequestReceived(const RequestType &request) final override
{
    auto future = GetResources()->GetPreProcessingThreads().enqueue([this, &request], {
        Preprocess(Request);
    });
    m_PreProcessingFutures.push_back(future);
}