- Overview
- Setup
- Dataset
- Released dataset format
- Python visualization scripts
- Generate labels for your own capture!
Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). With just a few annotations (representing 1-2% of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, and obtain 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo.
We release a challenging dataset consisting image frames of tail-end distribution categories (such as Fish, Colobus Monkeys, Chimpanzees, etc.) with their corresponding 2D, 3D, and Bounding-Box labels generated from minimal human intervention. Some of the prominent use cases of this dataset include not only sparse 2D and 3D landmark prediction, but also dense reconstruction tasks such as dense deformable shape reconstruction, novel view rendering (NeRF), and finally this dataset could also be used for camera estimation in Simultaneous Localization and Mapping (SLAM) frameworks.
The data was collected by two smartphone cameras without any constraints: meaning no guidance or instructions were given as to how the data should be collected. The intention was to mimic the data captured casually by anyone holding a smartphone grade camera. Due to this reason, the cameras were continuously moving in space changing their extrinsics with respect to each other, capturing an in-the-wild dynamic scene. Thus, this dataset could be used to benchmark robust algorithms in various computer vision tasks.
- Clone the repository.
git clone git@github.com:mosamdabhi/MBW-Data.git cd MBW-Data/Data
- Download the data zip file from here:
- Place the downloaded zip file in the
MBW-Data/Data
directory and uncompress the file. - Create a conda environment to run the visualization scripts.
conda env create -f environment.yml && conda activate mbw
We capture 2-View videos from handheld smartphone cameras. In our case, we used an iPhone 11 Pro Max and an iPhone 12 Pro Max to capture the video sequences. We use Final Cut Pro to manually synchronize the 2-View video sequences using the audio signal and time stamps. Please note that all we require are 2-view synchronized image frames and manual annotations for 1-2% of the data. No camera calibration (intrinsics or extrinsics) is required to run MBW.
In total, there are 16154 instances in this dataset from 7 different object categories, coming from 2 camera views.
Example sequences from the capture of Fish, Chimpanzee, Colobus Monkey, and Tiger categories are shown below:
-
We manually annotate 1-2% of the image frames per view. Our annotation consists of the 2D landmark keypoints. The location of landmarks is chosen to extract articulated information from the objects. Joint connection visualization is shown below.
Fish Chimpanzee Colobus Monkey Tiger Clown fish Seahorse Turtle
- We visualize the predicted 2D keypoints from MBW for Fish, Chimpanzee, and Colobus Monkey below (script to visualize is provided in this repository):
The dataset () is divided into two directories: annot
and images
. As names suggest, the annot
directory contains annotations and images
directory consists of 2-view synchronized image frames.
-
The annotations are provided as a
.pkl
file. The pickle files consists of following keys:Key Description W_GT Manual annotation. Non-NaN values for 1-2% of data. NaN values for the rest. W_Predictions 2D landmark predictions (labels) generated from MBW. S_Pred 3D landmark predictions (labels) generated from MBW. The 3D reconstructions are up-to-scale. BBox Bounding box crops generated from MBW confidence Flag specifying confidence for the MBW predictions. True
specifies high confidence.False
specifies low confidence. This flag is generated from the uncertainty equation (Eq. 2) given in the paper.
The images are provided in the images
directory as .jpg
files.
The labels provided in the .pkl
file can be visualized with the following shell script. To run the visualization, run the following:
cd python ./label_visualizations.sh
- Above shell script accepts the following arguments which you can change according to the need:
Argument Input options Description dataset Fish
,Chimpanzee
,Colobus_Monkey
Choose the object whose labels you want to visualize. generate_labels_type 2D
,3D
,BBox
Choose the type of label visualization. 2D
andBBox
overlays the 2D landmark predictions and Bounding Box crops onto the images while3D
label visualization generates a 3D skeleton structures with moving camera viewing angles.vis_labels_path Visualizations
Path to store label visualizations label_color blue
,red
Landmark (keypoint) colors for visualization img_format .jpg
,.png
Image storage format visualize_only_confident Boolean flag Only visualize "confident" labels, i.e. verified as correct by MV-NRSfM Eq. (2)
Since all we need are N-view synchroized image frames and annotations for 1-2% of data, all you need to provide are 2 or more views synchronized image frames and manual annotations for 1-2% of your data. After putting these annotations and image frames in the format shown above, MBW is able to provide 2D, 3D, and Bounding Box labels for your data.
The code to run MBW on your own data can be found here: https://github.com/mosamdabhi/MBW
If you use our code, dataset, or models in your research, please cite with:
@inproceedings{dabhi2022mbw,
title={MBW: Multi-view Bootstrapping in the Wild},
author={Dabhi, Mosam and Wang, Chaoyang and Clifford, Tim and Jeni, Laszlo and Fasel, Ian and Lucey, Simon},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022},
ee = {https://openreview.net/forum?id=i1bFPSw42W0},
organization={NeurIPS}
}
@inproceedings{dabhi2021mvnrsfm,
title={High Fidelity 3D Reconstructions with Limited Physical Views},
author={Dabhi, Mosam and Wang, Chaoyang and Saluja, Kunal and Jeni, Laszlo and Fasel, Ian and Lucey, Simon},
booktitle={2021 International Conference on 3D Vision (3DV)},
year={2021},
ee = {https://ieeexplore.ieee.org/abstract/document/9665845},
organization={IEEE}
}