How do I know which recognition area my target recognition box comes from?
hadestyz opened this issue · 12 comments
Dear author,I added preprocessing to divide the recognition area into two parts, but I don't know how to distinguish which area the target recognition box comes from in the following process
This is my preprocessing configuration:
[group-0]
src-ids=0;1;2;3;4;5
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=1
roi-params-src-0=0;0;1280;720;272;0;736;720
roi-params-src-1=0;0;1280;720;272;0;736;720
roi-params-src-2=0;0;1280;720;272;0;736;720
roi-params-src-3=0;0;1280;720;272;0;736;720
roi-params-src-4=0;0;1280;720;272;0;736;720
roi-params-src-5=0;0;1280;720;272;0;736;720
This is the func:
def _tiler_sink_pad_buffer_probe(buffer, user_data):
# Retrieve batch metadata from the gst_buffer
try:
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(buffer)
l_frame = batch_meta.frame_meta_list
except:
return DSL_PAD_PROBE_OK
while l_frame is not None:
try:
frame_meta = pyds.glist_get_nvds_frame_meta(l_frame.data)
except StopIteration:
break
frame_number = frame_meta.frame_num
source_id = frame_meta.source_id
l_obj = frame_meta.obj_meta_list
boxes = []
while l_obj is not None:
try:
# Casting l_obj.data to pyds.NvDsObjectMeta
obj_meta = pyds.glist_get_nvds_object_meta(l_obj.data)
except StopIteration:
break
boxes.append([int(obj_meta.detector_bbox_info.org_bbox_coords.left), int(obj_meta.detector_bbox_info.org_bbox_coords.left)+int(obj_meta.detector_bbox_info.org_bbox_coords.width), int(obj_meta.detector_bbox_info.org_bbox_coords.top), int(obj_meta.detector_bbox_info.org_bbox_coords.top)+int(obj_meta.detector_bbox_info.org_bbox_coords.height)])
try:
l_obj = l_obj.next
except StopIteration:
break
Is there a way to distinguish which recognition area the box comes from?
@hadestyz unfortunately you can't... see https://forums.developer.nvidia.com/t/how-can-i-find-out-which-roi-inference-result-came-from-among-the-obj-information-of-obj-meta-list-from-the-preprocess-result/253314/14
We've been working around this limitation by using Duplicate Sources to create duplicate streams so that each stream has only one ROI. It definitely adds overhead, but it allows you to distinguish which ROI the object was detected in.
Wish I had better news for you.
I hope it can be achieved one day.Thank you.
@rjhowell44 sorry,I would like to ask again, is the Duplicate Source you mentioned different streams on the same pipeline? Are there still two pipelines? Are there any similar examples here for reference?
@hadestyz the Duplicate Source creates a duplicate stream for the same Pipeline. You can duplicate the same source (stream) as many times as you like. Then you can define a different ROI for each stream.
For your example above, you have 6 sources with 2 ROIs each. You could make them 12 sources/streams with 1 ROI each. Just know that when you add both the original Sources and the duplicate Sources to the Pipeline, they will be assigned stream-ids in the order you add them.
Keep in mind that you are doubling the number of frames that need processing by the preprocessor, inference engines, and tracker. So a lot of extra overhead.
We typically use a Demuxer for this case and then we don't add branches or Sinks to the duplicate streams. Just let the demuxer drop the buffers... add Branches/Sinks to just the original Streams for viewing, recording, streaming, etc.
Just FYI, as others may run into this problem, I will make time to add a diagram and description of this workaround to the Preproc docs in DSL.
@hadestyz Sorry, I misspoke. There is no extra overhead for the Inference engines as the number of ROIs, which equals the batch-size, remains the same. 6 sources with 2 ROIs or 12 sources with 1 ROI means => the batch-size is 12 for both.
@rjhowell44 Looking forward to your relevant documents, thank you very much
@rjhowell44 For example, after copying multiple streams, each stream corresponds to a different recognition area. How should I ensure that these streams are aligned and synchronized? How should I obtain data from multiple streams, unify their box data, process it centrally, and distribute it downstream? I don't seem see any interfaces in Deepstream that provide merging.
@hadestyz my apologies for the slow response... very busy. I will try and put together a diagram with some comments tonight.
@rjhowell44 It seems like you’re really busy over there and haven’t had the time to update the relevant documentation. I hope everything is going smoothly for you.
@hadestyz again, sorry, very busy. I've gotten as far as creating a diagram for the docs. See below.
Her are some important points.
- The Duplicate Source simply Tees into the Original Source's stream
- Both the Original Source and the Duplicate Source are linked as input to the streammuxer (pad-0, pad-1).
- A gstreamer Tee does not copy/duplicate each buffer, but simply pushes duplicate pointers to the same buffer onto both of the streammuxer's input pads.
- The streammuxer will batch the two streams as if they are two sources, but they are really the same buffers.
- The preprocessor, inference engine and tracker will process both as separate streams.
- Now, instead of defining two preprocessor ROIs for the original source, define one for the Original Source, and one for the Duplicate Source.
- When you're processing the frame and object metadata you can tell which ROI the object is in by which source it is in.
- When viewing the output, only connect a branch to the Demuxer for stream-0. You can let the Demuxer drop the Duplicate stream.
I hope this. Please follow up with additional questions if you have them.
@rjhowell44 Thank you for your answer and for this clear chart. It makes me understand the process clearly.
If I have a Original source-A and its Duplicate source-B, and in the process shown in the above figure, the data flow is still frame by frame, it seems that there are no related components or functions in the deepstream , so I can accumulate two frames and process them once?
But I can do some processing on the probe function, merging the data for processing every time I receive the same number of frames of source-A and source-B data. This requires the data processing speed of source-A and source-B to be similar, in order to avoid having too much backlog of data on my own stack.
So can the streammux plugin on Deepstream ensure that source-A and source-B are passed to the inference engine as evenly as possible?
@hadestyz the Pipeline's built-in streammuxer batches the frames from the original and duplicate sources and adds the metadata for each frame... meaning the pointers to the buffers are batched together so they can be processed as a batch by the inference engine and tracker. If you add your custom pph to the source-pad (output) of the tracker you will iterate through the batched metadata. It is the Demuxer that un-batches the buffers into separate streams.
This code (the custom pph) that you had above, is iterating on all frames in the batch. If all sources are the same frame-rate, then everything will be synchronized.
while l_frame is not None:
try:
frame_meta = pyds.glist_get_nvds_frame_meta(l_frame.data)
except StopIteration:
break
frame_number = frame_meta.frame_num
source_id = frame_meta.source_id
l_obj = frame_meta.obj_meta_list
boxes = []
while l_obj is not None:
try:
# Casting l_obj.data to pyds.NvDsObjectMeta
obj_meta = pyds.glist_get_nvds_object_meta(l_obj.data)
except StopIteration:
break
boxes.append([int(obj_meta.detector_bbox_info.org_bbox_coords.left), int(obj_meta.detector_bbox_info.org_bbox_coords.left)+int(obj_meta.detector_bbox_info.org_bbox_coords.width), int(obj_meta.detector_bbox_info.org_bbox_coords.top), int(obj_meta.detector_bbox_info.org_bbox_coords.top)+int(obj_meta.detector_bbox_info.org_bbox_coords.height)])
try:
l_obj = l_obj.next
except StopIteration:
break
The only think you have to worry about is if you have overlapping ROIs. Then everything gets a little trickier.