/viser

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Primary LanguagePythonApache License 2.0Apache-2.0

ViSER

For better and more robust reconstruction of quadreped animals and human, please check out BANMo.

Changelog

  • 07/31/22: Use a larger laplacian smoothness loss by default.
  • 05/16/22: Fix flip bug in flow pre-computation.
  • 05/22/22: Fix bug in flow rendering that causes self-intersection.

Installation with conda

conda env create -f viser.yml
conda activate viser-release
# install softras
cd third_party/softras; python setup.py install; cd -;
# install manifold remeshing
git clone --recursive git://github.com/hjwdzh/Manifold; cd Manifold; mkdir build; cd build; cmake .. -DCMAKE_BUILD_TYPE=Release;make -j8; cd ../../

Data preparation

Create folders to store intermediate data and training logs

mkdir log; mkdir tmp; 

Download pre-processed data (rgb, mask, flow) following the link here and unzip under ./database/DAVIS/. The dataset is organized as:

DAVIS/
    Annotations/
        Full-Resolution/
            sequence-name/
                {%05d}.png
    JPEGImages/
        Full-Resolution/
            sequence-name/
                {%05d}.jpg
    FlowBW/ and FlowFw/
        Full-Resolution/
            sequence-name/ and optionally seqname-name_{%02d}/ (frame interval)
                flo-{%05d}.pfm
                occ-{%05d}.pfm
                visflo-{%05d}.jpg
                warp-{%05d}.jpg

To run preprocessing scripts on other videos, see here.

Example: breakdance-flare

Run

bash scripts/breakdance-flare.sh

To monitor optimization, run

tensorboard --logdir log/

To render optimized breakdance-flare

bash scripts/render_result.sh breakdance-flare log/breakdance-flare-1003-ft2/pred_net_20.pth 36

Example outputs:

To optimize dance-twirl, check out scripts/dance-twirl.sh.

Example: elephants

Run

bash scripts/elephants.sh

To monitor optimization, run

tensorboard --logdir log/

To render optimized shapes

bash scripts/render_elephants.sh log/elephant-walk-1003-6/pred_net_10.pth 36

Example outputs:

elephant-walk-all.mp4
elephant0009-all.mp4
elephant0058-all.mp4

Evaluation

Download sample results

wget https://www.dropbox.com/s/4bne43yxp89aleu/breakdance-results.zip
unzip breakdance-results.zip

Run evaluation

python eval_pck.py  --testdir log/rbreakdance-flare-viser/ --seqname breakdance-flare --type mesh

This should return the result of PCK: 70.52% (Tab 1 of the paper, break-1.)

To evalute on other sequences, change $seqname to {breakdance, dance-twirl, parkour} etc. The annotated keypoints are stored at database/joint_annotations. The results to be evaluated should be stored at $testdir, and contain meshes and camera paramters in the following format.

# $seqname-pred%d.ply/ # mesh (V,F)
# $seqname-cam%d.txt/ # camera
# [R_3x3|T_3x1] # V'=RV+T should be in the view space
# [fx,fy,px,py] # in pixel

Additional Notes

Multi-GPU training

By default we use 1 GPU. The codebase also supports single-node multi-gpu training with pytorch distributed data-parallel. Please modify dev and ngpu in scripts/xxx.sh to select devices.

Potential bugs
  • When setting batch_size to 3, rendered flow may become constant values.

Acknowledgement

The code borrows the skeleton of CMR

External repos:

Citation

To cite our paper
@inproceedings{yang2021viser,
  title={ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction},
  author={Yang, Gengshan 
      and Sun, Deqing
      and Jampani, Varun
      and Vlasic, Daniel
      and Cole, Forrester
      and Liu, Ce
      and Ramanan, Deva},
  booktitle = {NeurIPS},
  year={2021}
}  
@inproceedings{yang2021lasr,
  title={LASR: Learning Articulated Shape Reconstruction from a Monocular Video},
  author={Yang, Gengshan 
      and Sun, Deqing
      and Jampani, Varun
      and Vlasic, Daniel
      and Cole, Forrester
      and Chang, Huiwen
      and Ramanan, Deva
      and Freeman, William T
      and Liu, Ce},
  booktitle={CVPR},
  year={2021}
}  

TODO

  • code clean up