intel/ros2_openvino_toolkit

ObjectSegmentationMaskrcnn not resizing input image

joelbudu opened this issue · 15 comments

@LewisLiuPub It seems the input images aren't being resized any longer in the ObjectSegmentationMaskrcnn model. Any input size different from exactly what the model size is does not work.

@joelbudu could you share the steps of this issue? or upload one image you used, on which we can double check.

Per my double check, the maskrcnn model's input is a image tensor with shape [1, 800, 1365, 3], which is a bit huge comparing with other models' inputs (e.g. 513x513, 640x640, etc.).
I verified this feature with Realsense Camera and USB camera, both work well.

Appreciate if you can share the detailed steps (with the test images), thus we can try again to reproduce this issue.

Do you mean it doesn't work when you set new shape of input tensor (by calling API set_shape() or the like)? @joelbudu

@LewisLiuPub To clarify I used a custom-trained model trained to an image size of 640 by 360 and then tried to make an inference on a larger image such as 1280 by 720 pixels. When exporting the model to IR I used the argument --input_shape [1,360,640,3] .

Could it have anything to do with some of these commented lines per chance? https://github.com/intel/ros2_openvino_toolkit/blob/33485e9b3c8ff2286522925cccba57ac72bf3641/openvino_wrapper_lib/src/models/object_segmentation_maskrcnn_model.cpp#LL106C2-L106C2

@joelbudu Hi Joelbudu, Could you give me the document link for you trained and convert the model use the argument "--input_shape". I want to reproduce your operatoinal process, but I use the "omz_downloader" and "omz_converter" not the argument --input_shape.

@joelbudu, in order to reproduce your issue, I tried the model converting command by using "mo", I also got warnings and the input shape was forced to 768x1365. it seems that the new version of "mo" tools set the limitation on that.

There are further actions I am tracking:

  • Support Yolov8 Segmentation into this project. I have already been in the progress.
  • If you think it is important for the maskrcnn model to support dynamic shape. I can escalate this issue to OpenVINO Toolkit team, to try to get some solutions or reasons for this limitation set.

Let me know if these actions work for you. Thanks.


Command and logs I used for MO converting:
/home/lewis/develop/openvino/openvino_env/bin/python -- /home/lewis/develop/openvino/openvino_env/bin/mo --framework=tf --output_dir=/root --model_name=mask_rcnn_inception_resnet_v2_atrous_coco --input=image_tensor --reverse_input_channels --transformations_config=/home/lewis/develop/openvino/openvino_env/lib/python3.8/site-packages/openvino/tools/mo/front/tf/mask_rcnn_support.json --tensorflow_object_detection_api_pipeline_config=/opt/openvino_toolkit/models/public/mask_rcnn_inception_resnet_v2_atrous_coco/mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/pipeline.config --input_model=/opt/openvino_toolkit/models/public/mask_rcnn_inception_resnet_v2_atrous_coco/mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/frozen_inference_graph.pb '--layout=image_tensor(NHWC)' '--input_shape=[1, 800, 1365, 3]' --compress_to_fp16=True '--layout=image_tensor(NHWC)' '--input_shape=[1, 360, 640, 3]' --compress_to_fp16=False
[ WARNING ] Model Optimizer removes pre-processing block of the model which resizes image keeping aspect ratio. Inference Engine does not support dynamic image size so the Intermediate Representation file is generated with the input image size of a fixed size.
[ WARNING ] The model resizes the input image keeping aspect ratio with min dimension 800, max dimension 1365. The provided input height 360, width 640 is transformed to height 768, width 1365.
[ WARNING ] The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if applicable) are kept.
[ WARNING ] The graph output nodes have been replaced with a single layer of type "DetectionOutput". Refer to the operation set specification documentation for more information about the operation.
[ WARNING ] The predicted masks are produced by the "masks" layer for each bounding box generated with a "detection_output" operation.
Refer to operation specification in the documentation for the information about the DetectionOutput operation output data interpretation.
The model can be inferred using the dedicated demo "mask_rcnn_demo" from the OpenVINO Open Model Zoo.
[ WARNING ] Network has 2 inputs overall, but only 1 of them are suitable for input channels reversing.
Suitable for input channel reversing inputs are 4-dimensional with 3 channels (in case of dynamic dimensions C channel must be provided in a layout for this input)
All inputs: [['image_tensor', <PartialShape: [1,360,640,3]>], ['image_info', <PartialShape: [1,3]>]]
Suitable inputs [['image_tensor', <PartialShape: [1,360,640,3]>]]
Check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2023_bu_IOTG_OpenVINO-2022-3&content=upg_all&medium=organic or on https://github.com/openvinotoolkit/openvino
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /root/mask_rcnn_inception_resnet_v2_atrous_coco.xml
[ SUCCESS ] BIN file: /root/mask_rcnn_inception_resnet_v2_atrous_coco.bin

@joelbudu, PR #295 for supporting yolov8 detection and segmentation has been created. it passed unit tests, but documentation/ci-test/regression-test are in progress.
If you are eager to use it, you can try it now.

Dear @LewisLiuPub @huangjiafengx I think the easiest way to replicate the issue for you is to use the same model from the omz_downloader and omz_converter and just use an image topic which with a resolution of 640x360. With that there was little/no detections from the model. I don't think it's an issue of dynamic reshape but how the images are resized before feeding into the model

@LewisLiuPub Thanks for the update on the yolov8 detection and segmentation.

@joelbudu , I think we have reproduced the issue. Thanks for your help. Once I complete yolov8 supporting, I'll try to dig it as early.

@joelbudu, let you know the status about yolov8 and maskrcnn segmentation:

The related code has beenuploaded to the development branch , unit tests and regression tests are passed. I am in progress of double checking the documentation, and will merge the code branch ros2 (thus into branch master later) once done.

You may run the command to have a try:

ros2 launch openvino_node pipeline_segmentation_instance.launch.py

By default, yolov8-seg model is used.
If maskrcnn inception one is expected, you need to manually modify the YAML file (by updating item "model", "model_type" and "label"):

model: /opt/openvino_toolkit/models/convert/public/mask_rcnn_inception_resnet_v2_atrous_coco/FP32/mask_rcnn_inception_resnet_v2_atrous_coco.xml model_type: maskrcnn<br> label: /opt/openvino_toolkit/labels/object_segmentation/frozen_inference_graph.labels #for maskrcnn

For the optimization for maskrcnn model, I just added CV preprocessing by keeping the input image's resolution ratio.

I am wondering if each of the 2 models will met your requirements. Appreciate if you can have a try and let me know your results and comments.

Thanks
weizhi

Dear @LewisLiuPub It seems like the issue still exists for the implementation done. I still don't get results for different higher resolutions than my model. Also for the yolov8 instance segmentation, there's no output for me.

@joelbudu , thanks for sharing your results. It's interesting for the missing results, and I'll have another round of deep dive for this issue.
BTW, do you have plan to try yolove8-seg model? The model takes better performance and accuracy than maskrcnn model.

Yes @LewisLiuPub I've tested the yolov8-seg model and it seems to be working fine in the ros2_openvino_toolkit. The issue is that I'm not seeing anything overlays in the image_rviz output for either the yolov8 or the new maskrcnn implementation.

For the mask_rcnn we do plan to move away from that but I thought it would be good to have that issue fixed in-case it is required.
Thanks

Got you. Let me have a check.

PR #300 and PR #301 are designed to fix the bugs.