03-batching
mrmeswani opened this issue · 1 comments
I am trying to learn how to use batching on tesnorrt RN50 engine. I was looking at the example 03-batching and I found that perhaps there is a missing file https://github.com/NVIDIA/yais/blob/master/examples/03_Batching/streaming-service.cc that is used to implement the inference on the batches?
I already have an engine for RN50 following example 01, and would like to see how I can use your batching example to do inferences in batches.
@mrmeswani - thanks for your interest.
The batching example is a proof of concept that implements the 01_GRPC example. This example implements a batched Echo service.
I've just pushed a new commit to master (dfd5545) which begins to address the issue. This commit will fix the link above.
I think there will be some larger changes needed for batched inference. In particular, I think we need to expose a ContextReset
virtual function that can be used to reset member variables of derived Context
.
For batched inference, I predict we will need to launch an async preprocessing lambda per request in OnRequestReceived
. We will have to track those futures and sync on them prior to launching the inference in ExecuteRPC
.
void OnRequestReceived(const RequestType &request) final override
{
auto future = GetResources()->GetPreProcessingThreads().enqueue([this, &request], {
Preprocess(Request);
});
m_PreProcessingFutures.push_back(future);
}