/GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”

Primary LanguagePythonMIT LicenseMIT

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng†,1, Boyao Zhou2, Ruizhi Shao2, Boning Liu2, Shengping Zhang*,1,3, Liqiang Nie1, Yebin Liu2

1Harbin Institute of Technology   2Tsinghua Univserity   3Peng Cheng Laboratory
*Corresponding author   Work done during an internship at Tsinghua Univserity

Introduction

We propose GPS-Gaussian, a generalizable pixel-wise 3D Gaussian representation for synthesizing novel views of any unseen characters instantly without any fine-tuning or optimization.

multi_person_live.mp4

Installation

To deploy and run GPS-Gaussian, run the following scripts:

conda env create --file environment.yml
conda activate gps_gaussian

Then, compile the diff-gaussian-rasterization in 3DGS repository:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/
pip install -e submodules/diff-gaussian-rasterization
cd ..

(optinal) RAFT-Stereo provides a faster CUDA implementation of the correlation sampler to speed up the model without impacting performance:

git clone https://github.com/princeton-vl/RAFT-Stereo.git
cd RAFT-Stereo/sampler && python setup.py install && cd ../..

If compiled this CUDA implementation, set corr_implementation='reg_cuda' in config/stereo_human_config.py else corr_implementation='reg'.

Run on synthetic human dataset

Dataset Preparation

  • We provide rendered THuman2.0 dataset for GPS-Gaussian training in 16-camera setting, download render_data from Baidu Netdisk or OneDrive and unzip it. Since we recommend rectifying the source images and determining the disparity in an offline manner, the saved files and the downloaded data necessity around 50GB of free storage space.
  • To train a more robust model, we recommend collecting more human scans for training (e.g. Twindom, Render People, 2K2K). Then, render the training data as the target scenario, including the number of cameras and the radius of the scene. We provide the rendering code to generate training data from human scans, see data documentation for more details.

Training

Note: At the first training time, we do stereo rectify and determine the disparity offline, the processed data will be saved at render_data/rectified_local. This process takes several hours and can extremely speed up the following training scheme. If you want to skip this pre-processing, set use_processed_data=False in stage1.yaml and stage2.yaml.

  • Stage1: pretrain the depth prediction model. Set data_root in stage1.yaml to the path of unzipped folder render_data.
python train_stage1.py
  • Stage2: train the full model. Set data_root in stage2.yaml to the path of unzipped folder render_data, and set the correct pretrained stage1 model path stage1_ckpt in stage2.yaml
python train_stage2.py
  • We provide the pretrained model GPS-GS_stage2_final.pth in Baidu Netdisk and OneDrive for fast evaluation and testing.

Testing

  • Real-world data: download the test data real_data from Baidu Netdisk or OneDrive. Then, run the following code for synthesizing a fixed novel view between src_view 0 and 1, the position of novel viewpoint between source views is adjusted with a ratio ranging from 0 to 1.
python test_real_data.py \
--test_data_root 'PATH/TO/REAL_DATA' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--src_view 0 1 \
--ratio=0.5
  • Freeview rendering: run the following code to interpolate freeview between source views, and modify the novel_view_nums to set a specific number of novel viewpoints.
python test_view_interp.py \
--test_data_root 'PATH/TO/RENDER_DATA/val' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--novel_view_nums 5

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{zheng2024gpsgaussian,
  title={GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis},
  author={Zheng, Shunyuan and Zhou, Boyao and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}