How to batch inference using grpc?
megadev2k opened this issue · 6 comments
When I run inference on a single image, works great. This is the working code:
def predict_with_image_file(self, image_path, server):
channel = grpc.insecure_channel(server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
image = np.array(image, np.float32) / 127.5 - 1.0
image_list = np.array([image], dtype=np.float32)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'crnn'
request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.model_spec.version.value = 1
request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))
try:
result = stub.Predict(request, 10.0)
return result
except Exception as err:
print(err)
return None
However when I try to run on the batch of images, I get the issue:
This is the test code I run:
def predict_with_image_file(self, image_path, server):
channel = grpc.insecure_channel(server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
image = np.array(image, np.float32) / 127.5 - 1.0
image_one = np.array([image], dtype=np.float32)
image_two = np.array([image], dtype=np.float32)
image_list = np.concatenate((image_one, image_two))
request = predict_pb2.PredictRequest()
request.model_spec.name = 'crnn'
request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.model_spec.version.value = 1
request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))
try:
result = stub.Predict(request, 10.0)
return result
except Exception as err:
print(err)
return None
Exception I get is:
<_Rendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "TensorArray shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_1918: Could not write to TensorArray index 0 because the value shape is [2,512] which is incompatible with the TensorArray's inferred element shape: [1,512] (consider setting infer_shape=False).
[[{{node shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3}}]]"
debug_error_string = "{"created":"@1607046421.561403977","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"TensorArray shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_1918: Could not write to TensorArray index 0 because the value shape is [2,512] which is incompatible with the TensorArray's inferred element shape: [1,512] (consider setting infer_shape=False).\n\t [[{{node shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3}}]]","grpc_status":3}"
Thank you for your help. Great platform by the way. Training runs super fast. Results are great. Saving into SavedModel is also seamless. The only thing I can't make to work is batch inference. Thank you for your help.
What I also tried doing is I updated line 55 in tfserve/export_saved_model.py
From:
image_tensor = tf.placeholder(
dtype=tf.float32,
shape=[1, image_size[1], image_size[0], 3],
name='input_tensor')
To:
image_tensor = tf.placeholder(
dtype=tf.float32,
shape=[None, image_size[1], image_size[0], 3],
name='input_tensor')
Then exported SavedModel again. Now exception I get is:
<_Rendezvous of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "len(sequence_length) != batch_size. len(sequence_length): 1 batch_size: 2
[[{{node CTCBeamSearchDecoder}}]]"
debug_error_string = "{"created":"@1607046784.385013349","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"len(sequence_length) != batch_size. len(sequence_length): 1 batch_size: 2\n\t [[{{node CTCBeamSearchDecoder}}]]","grpc_status":9}"
Hmm, looks like I made it work, but still question is bellow. So I fixed above error by modifying line 55 in tfserve/export_saved_model.py
From:
image_tensor = tf.placeholder(
dtype=tf.float32,
shape=[1, image_size[1], image_size[0], 3],
name='input_tensor')
To:
image_tensor = tf.placeholder(
dtype=tf.float32,
shape=[None, image_size[1], image_size[0], 3],
name='input_tensor')
And also by modifying line 76 in tfserve/export_saved_model.py
From:
decodes, _ = tf.nn.ctc_beam_search_decoder(
inputs=inference_ret,
sequence_length=CFG.ARCH.SEQ_LENGTH * np.ones(1),
merge_repeated=False
)
To:
decodes, _ = tf.nn.ctc_beam_search_decoder(
inputs=inference_ret,
sequence_length=CFG.ARCH.SEQ_LENGTH * np.ones(2),
merge_repeated=False
)
Then exported SavedModel again.
Now sending batch of 2 images, seems to work. This is the code I ran:
def predict_with_image_file(self, image_path, server):
channel = grpc.insecure_channel(server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
image = np.array(image, np.float32) / 127.5 - 1.0
image_one = np.array([image], dtype=np.float32)
image_two = np.array([image], dtype=np.float32)
image_list = np.concatenate((image_one, image_two))
request = predict_pb2.PredictRequest()
request.model_spec.name = 'crnn'
request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
request.model_spec.version.value = 4
request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))
try:
result = stub.Predict(request, 10.0)
return result
except Exception as err:
print(err)
return None
However, now next question is. In line 76 in tfserve/export_saved_model.py, number of sequences has to be hardcoded. But I always have random batch of words predicted before I batch them and send for inference to the RCNN-Tensorflow. How to avoid hardcoding sequence_length and let Tensorflow Model Serving take care of that?
I mean the hack-fix could be to set the "sequence_length" to some higher number, for example 100. Then when for example the batch of 10 words comes in, append 90 more of empty tensors. However is there a better way to fix this issue? The number of the words in each batch is dynamic, it's never the set number. Thank you for your help.
@megadev2k The sequence length can not be dynamic due to the model's structure.
Got ya. Then I'll make a workaround it. I mean I can batch the responses to the number of sequences and then if there are more than the hard limit, then I would run inference multiple times. Thank you for confirming @MaybeShewill-CV.
@megadev2k Thanks for sharing. You're welcome:)