Non-360 View
sparro12 opened this issue · 10 comments
Our camera suite lacks view of a ~30 degree region in the rear of the vehicle. Classical methods of image stitching would break down without overlapping regions between cameras for image stitching. Since this method uses a learning network, I could see it being robust to this.
So to clarify the question, how would you expect this method to respond to a non-360 view. Would the region just show up as the added "occluded" class since it is not within the view of any of the cameras?
By introducing the "occluded"-class, we explicitly ask the network to not hallucinate, i.e., make predictions about areas it cannot possibly reason about.
Let's assume you pre-trained DeepLab Xception with our dataset for 4 cameras (360deg). If you then created a classical IPM image using 3 cameras only (<360deg), whatever the network would predict in the occluded region could be considered "undefined behavior", since it was trained on full 360deg IPM images. You could however already apply the same occlusions to our inputs and labels in pre-processing, s.t. the network would also reliably predict the uncovered region for your cameras as "occluded".
If you wanted to use the uNetXST approach, this gets more complicated, since the transformation is happening in the network. In that case you probably cannot train on e.g. 4 cameras but only apply on 3 different cameras without further modifications.
So if the simulation is setup to also not have a 360 degree view since that is the way the cameras are set up on the car, we could train uNetXST with a non-360 degree view. This would work, correct, because our training data would also not have a 360 degree view? The area not covered by the cameras wouldn't be part of the occluded class because there is simply no data there to classify even during training.
Well, you need to set some class for the not covered area in the birds-eye-view label image. Whether that would be the occluded class or another new class unlabeled would be up to you. But yes, training such a setup should be possible in principle.
Note that this is in principle similar to the 2_F dataset
, where there also also areas that are not covered by the camera.
Understood. So, in that case, the unseen area was just categorized as occluded. So we could augment the BEV images to pre-label the unseen area as occluded before we show the label to the network when training.
Yes, exactly. Note that you should also apply this "augmentation" in the input image, i.e., the homography image.
As I said earlier, if you want to use the uNetXST approach, this gets more complicated and may not be feasibly without custom training data.
Ahh I see. So the plan was to use our own simulation data, write a script to semantically segment the images using known ground truths from a drone camera, and then use your script to cast our rays and determine where the occluded regions are from our intrinsics/extrinsics.
If I understand correctly, this wouldn't work though because we need to generate a homography image using the ipm script and homography won't work unless there is overlapping regions. Does this sound accurate?
Does the above seem correct? If so, we will look into ways around the problem. Otherwise, we will allocate some time to testing this soon.
Creating a homography image using ipm.py
should still work, even if the cameras have no overlapping regions. You can test this yourself by simply running IPM on only front/rear images. The regions without camera evidence will be black by default I think.
occlusion.py
would need to be extended, s.t. the non-visible areas are removed from your GT drone images.
Does that make sense?
Yes at a high level this makes sense. We will try to look into this deeper. Thank you again