How to batch inference using grpc?

Question

How to batch inference using grpc?

megadev2k opened this issue 4 years ago · 6 comments

When I run inference on a single image, works great. This is the working code:

    def predict_with_image_file(self, image_path, server):
        channel = grpc.insecure_channel(server)
        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
        image = np.array(image, np.float32) / 127.5 - 1.0

        image_list = np.array([image], dtype=np.float32)

        request = predict_pb2.PredictRequest()
        request.model_spec.name = 'crnn'
        request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
        request.model_spec.version.value = 1

        request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))

        try:
            result = stub.Predict(request, 10.0)

            return result
        except Exception as err:
            print(err)
            return None

However when I try to run on the batch of images, I get the issue:
This is the test code I run:

    def predict_with_image_file(self, image_path, server):
        channel = grpc.insecure_channel(server)
        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
        image = np.array(image, np.float32) / 127.5 - 1.0

        image_one = np.array([image], dtype=np.float32)
        image_two = np.array([image], dtype=np.float32)
        image_list = np.concatenate((image_one, image_two))

        request = predict_pb2.PredictRequest()
        request.model_spec.name = 'crnn'
        request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
        request.model_spec.version.value = 1

        request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))

        try:
            result = stub.Predict(request, 10.0)

            return result
        except Exception as err:
            print(err)
            return None

Exception I get is:

<_Rendezvous of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "TensorArray shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_1918: Could not write to TensorArray index 0 because the value shape is [2,512] which is incompatible with the TensorArray's inferred element shape: [1,512] (consider setting infer_shape=False).
         [[{{node shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3}}]]"
        debug_error_string = "{"created":"@1607046421.561403977","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"TensorArray shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_1918: Could not write to TensorArray index 0 because the value shape is [2,512] which is incompatible with the TensorArray's inferred element shape: [1,512] (consider setting infer_shape=False).\n\t [[{{node shadow_net/sequence_rnn_module/stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3}}]]","grpc_status":3}"

Thank you for your help. Great platform by the way. Training runs super fast. Results are great. Saving into SavedModel is also seamless. The only thing I can't make to work is batch inference. Thank you for your help.

Answer 1 · 2020-12-04T01:56:47.000Z

What I also tried doing is I updated line 55 in tfserve/export_saved_model.py
From:

image_tensor = tf.placeholder(
        dtype=tf.float32,
        shape=[1, image_size[1], image_size[0], 3],
        name='input_tensor')

To:

image_tensor = tf.placeholder(
        dtype=tf.float32,
        shape=[None, image_size[1], image_size[0], 3],
        name='input_tensor')

Then exported SavedModel again. Now exception I get is:

<_Rendezvous of RPC that terminated with:
        status = StatusCode.FAILED_PRECONDITION
        details = "len(sequence_length) != batch_size.  len(sequence_length):  1 batch_size: 2
         [[{{node CTCBeamSearchDecoder}}]]"
        debug_error_string = "{"created":"@1607046784.385013349","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"len(sequence_length) != batch_size.  len(sequence_length):  1 batch_size: 2\n\t [[{{node CTCBeamSearchDecoder}}]]","grpc_status":9}"

Answer 2 · 2020-12-04T02:16:48.000Z

Hmm, looks like I made it work, but still question is bellow. So I fixed above error by modifying line 55 in tfserve/export_saved_model.py

From:


image_tensor = tf.placeholder(
        dtype=tf.float32,
        shape=[1, image_size[1], image_size[0], 3],
        name='input_tensor')

To:

image_tensor = tf.placeholder(
        dtype=tf.float32,
        shape=[None, image_size[1], image_size[0], 3],
        name='input_tensor')

And also by modifying line 76 in tfserve/export_saved_model.py
From:

    decodes, _ = tf.nn.ctc_beam_search_decoder(
        inputs=inference_ret,
        sequence_length=CFG.ARCH.SEQ_LENGTH * np.ones(1),
        merge_repeated=False
    )

To:

    decodes, _ = tf.nn.ctc_beam_search_decoder(
        inputs=inference_ret,
        sequence_length=CFG.ARCH.SEQ_LENGTH * np.ones(2),
        merge_repeated=False
    )

Then exported SavedModel again.

Now sending batch of 2 images, seems to work. This is the code I ran:

    def predict_with_image_file(self, image_path, server):
        channel = grpc.insecure_channel(server)
        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        image = cv2.resize(image, (CFG.ARCH.INPUT_SIZE[0], CFG.ARCH.INPUT_SIZE[1]), interpolation=cv2.INTER_LINEAR)
        image = np.array(image, np.float32) / 127.5 - 1.0

        image_one = np.array([image], dtype=np.float32)
        image_two = np.array([image], dtype=np.float32)
        image_list = np.concatenate((image_one, image_two))

        request = predict_pb2.PredictRequest()
        request.model_spec.name = 'crnn'
        request.model_spec.signature_name = sm.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
        request.model_spec.version.value = 4

        request.inputs['input_tensor'].CopyFrom(make_tensor_proto(image_list, dtype=None, shape=image_list.shape))

        try:
            result = stub.Predict(request, 10.0)

            return result
        except Exception as err:
            print(err)
            return None

However, now next question is. In line 76 in tfserve/export_saved_model.py, number of sequences has to be hardcoded. But I always have random batch of words predicted before I batch them and send for inference to the RCNN-Tensorflow. How to avoid hardcoding sequence_length and let Tensorflow Model Serving take care of that?

Answer 3 · 2020-12-04T02:34:09.000Z

I mean the hack-fix could be to set the "sequence_length" to some higher number, for example 100. Then when for example the batch of 10 words comes in, append 90 more of empty tensors. However is there a better way to fix this issue? The number of the words in each batch is dynamic, it's never the set number. Thank you for your help.

Answer 4 · 2020-12-04T03:08:22.000Z

@megadev2k The sequence length can not be dynamic due to the model's structure.

Answer 5 · 2020-12-04T05:22:59.000Z

Got ya. Then I'll make a workaround it. I mean I can batch the responses to the number of sequences and then if there are more than the hard limit, then I would run inference multiple times. Thank you for confirming @MaybeShewill-CV.

Answer 6 · 2020-12-04T05:33:41.000Z

@megadev2k Thanks for sharing. You're welcome:)