MoSca: A Modern 4D Reconstruction System for Monocular Videos
- 2024.Nov.28th: code pre-release. This is a preview of MoSca. The code needs to be cleaned, and more functions should be added (see TODOs) in the future, but the first author does not have enough time to do so at this moment because he has to start finding the next position after PhD graduation in 2025.
- 2024.Nov.28th: important note, now the code supports TAP: BootsTAPIR, CoTracker, and SpaTracker; and Depth: DepthCrafter, Metric3D-v2, and UniDepth. I authorize the MIT license only to the code I wrote (mostly in
lib_moca
andlib_mosca
). The code from all those third-party foundational models and GS-Splatting (mostly inlib_prior
andlib_render
) is not included in the MIT license. Please refer to the original authors for the license of these third-party codes.
-
Simply run the following command. This script assumes you have an Ubuntu environment and Anaconda installed. The CUDA version used is 11.8. You may have to tweak the script to fit your own environment.
bash install.sh
-
Download from here some checkpoints for the 2D foundational models if they are not HG downloadables.
WARNING: By downloading these checkpoints, you must agree and obey the original license from the original authors (RAFT, SpaTracker, and TAPNet). Unzip the weights into the following file structure:
ProjRoot/weights ├── raft_models │ ├── raft-things.pth │ └── ... ├── spaT_final.pth └── tapnet └── bootstapir_checkpoint_v2.pt
A few demo scenes are provided in the data
directory. A list of images is stored under demo/SeqName/images
, and the main program is designed to process these lists of images into a 4D scene.
# Infer off-the-shelf 2D models
python mosca_precompute.py --cfg ./profile/demo/demo_prep.yaml --ws ./demo/duck
# Fit the 4D scene
python mosca_reconstruct.py --cfg ./profile/demo/demo_fit.yaml --ws ./demo/duck
You should expect some output like this:
duck_480.mp4
More examples are in example.sh
. demo.ipynb
also provides some examples of the system.
We also ship a sub-module MoCa
, which corresponds to "Moving Monocular Camera", that is a standalone module before MoSca for tracklet-based BA solving camera pose and depth alignment. To run this submodule, for example, simply:
# Infer off-the-shelf 2D models with a reduced mode
python mosca_precompute.py --cfg ./profile/demo/demo_prep.yaml --ws ./demo/duck --skip_dynamic_resample
# Fast solve a small BA
python lite_moca_reconstruct.py --cfg ./profile/demo/demo_fit.yaml --ws ./demo/duck
You should expect some output like this:
static_scaffold_init.mp4
- Now we provide instructions to reproduce our results for Tab.1 (Dycheck), Tab.2 (Nvidia), and Tab.3 (Tum and Sintel) in the new paper.
- (Option-A) Reproduce by running locally:
-
Download the data from here. By downloading the data, you must agree and obey the original license from the original authors (Dycheck, Nvidia, TUM, and Sintel). Unzip into the following file structure:
ProjRoot/data/iphone ├── apple ├── ... └── wheel
-
Check the script
reproduce.sh
. For example, if you have 1 GPU, just run:bash reproduce.sh
If you have multiple GPUs, you can run
bash reproduce.sh #GPU_ID #NUM_OF_TOTAL_DEVICES
in several terminals.
-
- (Option-B) Reproduce by downloading the checkpoints run by us from here. Unzip the downloaded subfolders in the same structure as above under
data
. - Finally, you can collect all the results by checking
collect_metrics.ipynb
to form reports stored indata/metrics_collected
.
- Sometimes, the system needs some tuning of the parameters. Add more detailed instructions for the parameters.
- Support manual labeling of the FG-BG masks.
- Support other focal initialization methods.
- Find a good visualizer for MoCa and MoSca.
- Replace the old render backends with the new GSplat.
- Only RAFT is used for optical flow now, check other checkpoints and methods.
- Docker environment.
I authorize the MIT license only to the code I wrote (mostly in lib_moca
and lib_mosca
). The code from all those third-party foundational models and GS-Splatting (mostly in lib_prior
and lib_render
) is not included in the MIT license. Please refer to the original authors for the license of these third-party codes.
If you use either MoCa or MoSca, you should cite our technical paper:
@article{lei2024mosca,
title={MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds},
author={Lei, Jiahui and Weng, Yijia and Harley, Adam and Guibas, Leonidas and Daniilidis, Kostas},
journal={arXiv preprint arXiv:2405.17421},
year={2024}
}