/banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos

Primary LanguagePythonOtherNOASSERTION

BANMo

Changelog

  • 04/11: Replace matching loss with feature rendering loss; Fix bugs in LBS; Stablize optimization.
  • 03/20: Add mesh color option (canonical mappihg vs radiance) during surface extraction. See --ce_color flag.
  • 02/23: Improve NVS with fourier light code, improve uncertainty MLP, add long schedule, minor speed up.
  • 02/17: Add adaptation to a new video, optimization with known root poses, and pose code visualization.
  • 02/15: Add motion-retargeting, quantitative evaluation and synthetic data generation/eval.

Install

Build with conda

We provide two versions.

[A. torch1.10+cu113 (1.4x faster on V100)]
# clone repo
git clone git@github.com:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo-cu113.yml
conda activate banmo-cu113
# install pytorch3d (takes minutes), kmeans-pytorch
pip install -e third_party/pytorch3d
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
[B. torch1.7+cu110]
# clone repo
git clone git@github.com:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo.yml
conda activate banmo
# install kmeans-pytorch
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

Data

We provide two ways to obtain data. The easiest way is to download and unzip the pre-processed data as follows.

[Download pre-processed data]

We provide preprocessed data for cat and human. Download the pre-processed rgb/mask/flow/densepose images as follows

# (~8G for each)
bash misc/processed/download.sh cat-pikachiu
bash misc/processed/download.sh human-cap
[Download raw videos]

Download raw videos to ./raw/ folder

bash misc/vid/download.sh cat-pikachiu
bash misc/vid/download.sh human-cap
bash misc/vid/download.sh dog-tetres
bash misc/vid/download.sh cat-coco

To use your own videos, or pre-process raw videos into banmo format, please follow the instructions here.

PoseNet weights

[expand]

Download pre-trained PoseNet weights for human and quadrupeds

mkdir -p mesh_material/posenet && cd "$_"
wget $(cat ../../misc/posenet.txt); cd ../../

Demo

This example shows how to reconstruct a cat from 11 videos and a human from 10 videos. For more examples, see here.

Hardware/time for running the demo

The short schedule takes 4 hours on 2 V100 GPUs (+SSD storage). To reach higher quality, the full schedule takes 12 hours. We provide a script that use gradient accumulation to support experiments on fewer GPUs / GPU with lower memory.

Setting good hyper-parameter for videos with various length

When optimizing videos with different lengths, we found it useful to scale batchsize with the number of frames. A rule of thumb is to set "num gpus" x "batch size" x "accu steps" ~= num frames. This means more video frames needs more GPU memory but the same optimization time.

Try pre-optimized models

We provide pre-optimized models and scripts to run novel view synthesis and mesh extraction (results saved at tmp/*all.mp4).

# download pre-optimized models
mkdir -p tmp && cd "$_"
wget https://www.dropbox.com/s/qzwuqxp0mzdot6c/cat-pikachiu.npy
wget https://www.dropbox.com/s/dnob0r8zzjbn28a/cat-pikachiu.pth
wget https://www.dropbox.com/s/p74aaeusprbve1z/opts.log # flags used at opt time
cd ../

# render novel views
bash scripts/render_nvs.sh 0 $seqname tmp/cat-pikachiu.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

seqname=cat-pikachiu
# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname tmp/cat-pikachiu.pth \
        "0 5" 64
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (use 256 to get higher-res mesh)
  

1. Optimization

[cat-pikachiu]
seqname=cat-pikachiu
# To speed up data loading, we store images as lines of pixels). 
# only needs to run it once per sequence and data are stored
python preprocess/img2lines.py --seqname $seqname

# Optimization
bash scripts/template.sh 0,1 $seqname 10001 "no" "no"
# argv[1]: gpu ids separated by comma 
# args[2]: sequence name
# args[3]: port for distributed training
# args[4]: use_human, pass "" for human cse, "no" for quadreped cse
# args[5]: use_symm, pass "" to force x-symmetric shape

# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft3/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9 10" 256
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (256 by default)
cat-pikachiu-.0.-all.mp4
[human-cap]
seqname=adult7
python preprocess/img2lines.py --seqname $seqname
bash scripts/template.sh 0,1 $seqname 10001 "" ""
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft3/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9" 256
adult7-.8.-all.mp4

2. Visualization tools

[Tensorboard]
# You may need to set up ssh tunneling to view the tensorboard monitor locally.
screen -dmS "tensorboard" bash -c "tensorboard --logdir=logdir --bind_all"
[Root pose, rest mesh, bones]

To draw root pose trajectories (+rest shape) over epochs

# logdir
logdir=logdir/$seqname-e120-b256-init/
# first_idx, last_idx specifies what frames to be drawn
python scripts/visualize/render_root.py --testdir $logdir --first_idx 0 --last_idx 120

Find the output at $logdir/mesh-cam.gif. During optimization, the rest mesh and bones at each epoch are saved at $logdir/*rest.obj.

pose-20.mp4
[Correspondence/pose code]

To visualize 2d-2d and 2d-3d matchings of the latest epoch weights

# 2d matches between frame 0 and 100 via 2d->feature matching->3d->geometric warp->2d
bash scripts/render_match.sh $logdir/params_latest.pth "0 100" "--render_size 128"

2d-2d matches will be saved to tmp/match_%03d.jpg. 2d-3d feature matches of frame 0 will be saved to tmp/match_line_pred.obj. 2d-3d geometric warps of frame 0 will be saved to tmp/match_line_exp.obj. near-plane frame 0 will be saved to tmp/match_plane.obj. Pose code visualization will be saved at tmp/code.mp4.

pose-code.mp4
[Render novel views]

Render novel views at the canonical camera coordinate

bash scripts/render_nvs.sh 0 $seqname logdir/$seqname-e120-b256-ft3/params_latest.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

Results will be saved at logdir/$seqname-e120-b256-ft3/nvs*.mp4.

nvs-pikachiu.mp4
[Render canonical view over iterations]

Render depth and color of the canonical view over optimization iterations

bash scripts/nvs_iter.sh 0 logdir/$seqname-e120-b256-init/
# argv[1]: gpu id
# argv[2]: path to the logdir

Results will be saved at logdir/$seqname-e120-b256-init/vis-iter*.mp4.

cat-pikachiu-vis-iter-iter-dph.mp4
cat-pikachiu-vis-iter-iter-rgb.mp4

Common install issues

[expand]
  • Q: pyrender reports ImportError: Library "GLU" not found.
    • install sudo apt install freeglut3-dev
  • Q: ffmpeg reports libopenh264.so.5 not fund
    • install ffmpeg sudo apt-get install ffmpeg and remove ~/anaconda/envs/banmo/bin/ffmpeg

Note on arguments

[expand]
  • use --use_human for human reconstruction, otherwise it assumes quadruped animals
  • use --full_mesh to disable visibility check at mesh extraction time
  • use --noce_color at mesh extraction time to assign radiance instead canonical mapping as vertex colors.
  • use --queryfw at mesh extraction time to extract forward articulated meshes, which only needs to run marching cubes once.
  • use --use_cc maintains the largest connected component for rest mesh in order to set the object bounds and near-far plane (by default turned on). Turn it off with --nouse_cc for disconnected objects such as hands.
  • use --debug to print out the rough time each component takes.

Acknowledgement

[expand]

Volume rendering code is borrowed from Nerf_pl. Flow estimation code is adapted from VCN-robust. Other external repos:

License

[expand]