andrewjong/Deep-Learning-Paper-Surveys

[Moniker] Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping (May 2020 ICLR)

Opened this issue 4 years ago · 0 comments

andrewjong commented 4 years ago

0. Article Information and Links

Paper's project website:
Release date: YYYY/MM/DD
Number of citations (as of 2020/MM/DD):
Talk that discusses this paper from CVPR 2020 3D Scene Understanding Workshop: https://www.youtube.com/watch?v=1d-KsKjWUbo&t=38m26s

1. What do the authors try to accomplish?

2. What's great compared to previous research?

Focuses on 3D learning from VIDEO
Agent can move freely through the scene at will: any translation, any angle.
Novel "View-Contrastive" loss objectives that outperform RGB regression
Can inherently predict moving 3D objects.
Transfer learning from learned 3D features in simulation
First work that can discover objects in 3D from a single camera viewpoint

3. Where are the key elements of the technology and method?

Use (blackbox) Egomotion estimator to stabilize moving scenes with moving objects.

Train the GRNN with View Prediction
This allows us to detect moving objects, Unsupervised. Because since the scene is stabilized, the moving objects are just things moving lol.
This allows us to predict 3D optical flow within a scene. Just randomly transform the 3D features and ask the model to recover the transformation. Once flow is calculated, check with reverse flow that we get back to the same thing. Kind of like CycleGAN.

Change to Contrastive Losses in 3D and 2D feature space, rather than RGB loss.

4. How do the authors measure success?

5. How did you verify that it works?

6. Things to discuss? (e.g. weaknesses, potential for future work, relation to other work)

7. Are there any papers to read next?

8. References