/CoDeF

Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Primary LanguagePythonOtherNOASSERTION

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Hao Ouyang*, Qiuyu Wang*, Yuxi Xiao*, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen†, Yujun Shen† (*equal contribution, †corresponding author)

Requirements

The codebase is tested on

  • Ubuntu 20.04
  • Python 3.10
  • PyTorch 2.0.0
  • PyTorch Lightning 2.0.2
  • 1 Nvidia GPU (RTX A6000 48GB) with CUDA version 11.7 (Other GPUs are also suitable, and 10GB of GPU memory is sufficient to run our code.)

To use video visualizer, please install ffmpeg by:

sudo apt-get install ffmpeg

For additional Python libraries, please install by:

pip install -r requirements.txt

Our code also depends on tiny-cuda-nn. See this for Pytorch extension install instructions.

Data

Our data

Download our data from this URL, unzip the file and put it in the current directory. Some additional data can be downloaded from here.

Customize your own data

To be released.

And organize files as follows:

CoDeF
│
└─── all_sequences
    │
    └─── NAME1
           └─ NAME1
           └─ NAME1_masks_0 (optional)
           └─ NAME1_masks_1 (optional)
           └─ NAME1_flow (optional)
           └─ NAME1_flow_confidence (optional)
    │
    └─── NAME2
           └─ NAME2
           └─ NAME2_masks_0 (optional)
           └─ NAME2_masks_1 (optional)
           └─ NAME2_flow (optional)
           └─ NAME2_flow_confidence (optional)
    │
    └─── ...

Pretrained checkpoints

You can download the pre-trained checkpoints trained with the current codebase as follows:

Sequence Name Config Download
beauty_0 configs/beauty_0/base.yaml Google drive link
beauty_1 configs/beauty_1/base.yaml Google drive link
white_smoke configs/white_smoke/base.yaml Google drive link
lemon_hit configs/lemon_hit/base.yaml Google drive link
scene_0 configs/scene_0/base.yaml Google drive link

And organize files as follows:

CoDeF
│
└─── ckpts/all_sequences
    │
    └─── NAME1
        │
        └─── EXP_NAME (base)
            │
            └─── NAME1.ckpt
    │
    └─── NAME2
        │
        └─── EXP_NAME (base)
            │
            └─── NAME2.ckpt
    |
    └─── ...

Train a new model

./scripts/train_multi.sh

where:

  • GPU: Decide which GPU to train on;
  • NAME: Name of the video sequence;
  • EXP_NAME: Name of the experiment;
  • ROOT_DIRECTORY: Directory of the input video sequence;
  • MODEL_SAVE_PATH: Path to save the checkpoints;
  • LOG_SAVE_PATH: Path to save the logs;
  • MASK_DIRECTORY: Directory of the preprocessed masks (optional);
  • FLOW_DIRECTORY: Directory of the preprocessed optical flows (optional);

Please check configuration files in configs/, and you can always add your own model config.

Test reconstruction

./scripts/test_multi.sh

After running the script, the reconstructed videos can be found in results/all_sequences/{NAME}/{EXP_NAME}, along with the canonical image.

Test video translation

After obtaining the canonical image through this step, use your preferred text prompts to transfer it using ControlNet. Once you have the transferred canonical image, place it in all_sequences/${NAME}/${EXP_NAME}_control (i.e. CANONICAL_DIR in scripts/test_canonical.sh).

Then run:

./scripts/test_canonical.sh

The transferred results can be seen in results/all_sequences/{NAME}/{EXP_NAME}_transformed.

Note: The canonical_wh option in the configuration file should be set with caution, usually a little larger than img_wh, as it determines the field of view of the canonical image.

BibTeX

@article{ouyang2023codef,
      title={CoDeF: Content Deformation Fields for Temporally Consistent Video Processing}, 
      author={Hao Ouyang and Qiuyu Wang and Yuxi Xiao and Qingyan Bai and Juntao Zhang and Kecheng Zheng and Xiaowei Zhou and Qifeng Chen and Yujun Shen},
      journal={arXiv preprint arXiv:2308.07926},
      year={2023}
}