- BDD_action_gt: use IMU and GPS info from BDD dataset to generate single action ground truth.
- multiple_action_labels: use AWS Mturk to label multiple actions and reasons of selected 12k BDD videos.
- data_info: contains names of train, test and validation datasets.
- mask-rcnn: Mask-RCNN model, forker from Facebook AI group and modified with action prediction.
- I3D: inflated Conv3D model, adapted to Pytorch 1.0 and our new annotated BDD multi-action dataset.
- maskrcnn-video: Using our customized I3D backbone with 640x360 image sequences input to extract glob features and roi features with selectors, performing end-to-end training.
- BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling
- The ApolloScape Dataset for Autonomous Driving
- The Cityscapes Dataset for Semantic Urban Scene Understanding
- An Overview of Multi-Task Learning in Deep Neural Networks
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
- UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-,and High-Level Vision using Diverse Datasets and Limited Memory
- Cross-stitch Networks for Multi-task Learning
- Trajectory prediction summary
- DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
- Predicting Deeper into the Future of Semantic Segmentation
- Predicting Future Instance Segmentation by Forecasting Convolutional Features
Summary: Video prediction papers with code
- Deep Multi-scale video prediction beyond mean square error
- Prediction Under Uncertainty with Error-Encoding Networks
- Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning
- Peeking into the Future: Predicting Future Person Activities and Locations in Video
Summary: Semantic segmentation papers with code
- Fast-SCNN: Fast Semantic Segmentation Network
- DeeplabV3: Rethinking Atrous Convolution for Semantic Image Segmentation