NVIDIA-ISAAC-ROS/isaac_ros_object_detection

Wrong detections using PeopleNet and Triton

Closed this issue · 2 comments

Hello, I have been testing the object detection pipeline using PeopleNet and my camera input (also a rosbag captured from this camera) on the AGX Orin. The camera has a resolution of 640x480. For this I have written my own config to be used with tao-converter

name: "detectnet"
platform: "tensorrt_plan"
max_batch_size: 16
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 480, 640 ]
  }
]
output [
  {
        name: "output_bbox/BiasAdd"
        data_type: TYPE_FP32
        dims: [ 12, 30, 40]
    },
    {
        name: "output_cov/Sigmoid"
        data_type: TYPE_FP32
        dims: [ 3, 30, 40]
    }
]

and use the following command to generate the model plan

setup_model.sh --height 480 --width 640 --config-file /workspaces/detectnet.pbtxt

I have also modified the launch file accordingly (change encoder input and output size)

def generate_launch_description():
    """Generate launch description for testing relevant nodes."""
    launch_dir_path = os.path.dirname(os.path.realpath(__file__))
    config = launch_dir_path + '/../config/params.yaml'
    model_dir_path = '/tmp/models'

    # Read labels from text file
    labels_file_path = f'{model_dir_path}/detectnet/1/labels.txt'
    with open(labels_file_path, 'r') as fd:
        label_list = fd.read().strip().splitlines()

    encoder_node = ComposableNode(
        name='dnn_image_encoder',
        package='isaac_ros_dnn_image_encoder',
        plugin='nvidia::isaac_ros::dnn_inference::DnnImageEncoderNode',
        parameters=[{
            'input_image_width': 640,
            'input_image_height': 480,
            'network_image_width': 640,
            'network_image_height': 480,
            'image_mean': [0.0, 0.0, 0.0],
            'image_stddev': [1.0, 1.0, 1.0],
            'enable_padding': False
        }],
        remappings=[('encoded_tensor', 'tensor_pub'),
                    ('image', 'v4l/camera/image_raw')]
    )

    triton_node = ComposableNode(
        name='triton_node',
        package='isaac_ros_triton',
        plugin='nvidia::isaac_ros::dnn_inference::TritonNode',
        parameters=[{
            'model_name': 'detectnet',
            'model_repository_paths': [model_dir_path],
            'input_tensor_names': ['input_tensor'],
            'input_binding_names': ['input_1'],
            'input_tensor_formats': ['nitros_tensor_list_nchw_rgb_f32'],
            'output_tensor_names': ['output_cov', 'output_bbox'],
            'output_binding_names': ['output_cov/Sigmoid', 'output_bbox/BiasAdd'],
            'output_tensor_formats': ['nitros_tensor_list_nhwc_rgb_f32'],
            'log_level': 0
        }])

    detectnet_decoder_node = ComposableNode(
        name='detectnet_decoder_node',
        package='isaac_ros_detectnet',
        plugin='nvidia::isaac_ros::detectnet::DetectNetDecoderNode',
        parameters=[config,
                    {
                        'label_list': label_list
                    }]
    )

    container = ComposableNodeContainer(
        name='detectnet_container',
        namespace='detectnet_container',
        package='rclcpp_components',
        executable='component_container_mt',
        composable_node_descriptions=[
            encoder_node, triton_node, detectnet_decoder_node],
        output='screen'
    )

    return launch.LaunchDescription([container])

When I run the pipeline with just me in the frame, I get my main objects detected correctly (i.e., 1 person and 1 face within the right bounding boxes), however I am also getting a lot of wrong detections (~10 bounding boxes which appear and disappear quickly near the other two boxes). I tested the pipeline using the same model and input video in DeepStream and the detection is working correctly.

This could be a difference in tuning parameters for post-processing the detections to filter out low confidence boxes. The default value for confidence_threshold according to Deepstream docs is 0.2, but the Isaac ROS default is 0.6. They both use the same implementation of DBScan, so aligning the parameters for PeopleNet should get you the same responses.

This could be a difference in tuning parameters for post-processing the detections to filter out low confidence boxes. The default value for confidence_threshold according to Deepstream docs is 0.2, but the Isaac ROS default is 0.6. They both use the same implementation of DBScan, so aligning the parameters for PeopleNet should get you the same responses.

I played around with the parameters and can confirm that it improved the results. I am closing this issue and would reopen it again, if I am still having issues using the same parameters as DeepStream.