GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Shunyuan Zheng†,1, Boyao Zhou2, Ruizhi Shao2, Boning Liu2, Shengping Zhang*,1,3, Liqiang Nie1, Yebin Liu2
1Harbin Institute of Technology 2Tsinghua Univserity 3Peng Cheng Laboratory
*Corresponding author †Work done during an internship at Tsinghua Univserity
Projectpage · Video · Paper · Supp.
We propose GPS-Gaussian, a generalizable pixel-wise 3D Gaussian representation for synthesizing novel views of any unseen characters instantly without any fine-tuning or optimization.
multi_person_live.mp4
To deploy and run GPS-Gaussian, run the following scripts:
conda env create --file environment.yml
conda activate gps_gaussian
Then, compile the diff-gaussian-rasterization
in 3DGS repository:
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/
pip install -e submodules/diff-gaussian-rasterization
cd ..
(optinal) RAFT-Stereo provides a faster CUDA implementation of the correlation sampler to speed up the model without impacting performance:
git clone https://github.com/princeton-vl/RAFT-Stereo.git
cd RAFT-Stereo/sampler && python setup.py install && cd ../..
If compiled this CUDA implementation, set corr_implementation='reg_cuda'
in config/stereo_human_config.py else corr_implementation='reg'
.
- We provide rendered THuman2.0 dataset for GPS-Gaussian training in 16-camera setting, download
render_data
from Baidu Netdisk or OneDrive and unzip it. Since we recommend rectifying the source images and determining the disparity in an offline manner, the saved files and the downloaded data necessity around 50GB of free storage space. - To train a more robust model, we recommend collecting more human scans for training (e.g. Twindom, Render People, 2K2K). Then, render the training data as the target scenario, including the number of cameras and the radius of the scene. We provide the rendering code to generate training data from human scans, see data documentation for more details.
Note: At the first training time, we do stereo rectify and determine the disparity offline, the processed data will be saved at render_data/rectified_local
. This process takes several hours and can extremely speed up the following training scheme. If you want to skip this pre-processing, set use_processed_data=False
in stage1.yaml and stage2.yaml.
- Stage1: pretrain the depth prediction model. Set
data_root
in stage1.yaml to the path of unzipped folderrender_data
.
python train_stage1.py
- Stage2: train the full model. Set
data_root
in stage2.yaml to the path of unzipped folderrender_data
, and set the correct pretrained stage1 model pathstage1_ckpt
in stage2.yaml
python train_stage2.py
- We provide the pretrained model
GPS-GS_stage2_final.pth
in Baidu Netdisk and OneDrive for fast evaluation and testing.
- Real-world data: download the test data
real_data
from Baidu Netdisk or OneDrive. Then, run the following code for synthesizing a fixed novel view betweensrc_view
0 and 1, the position of novel viewpoint between source views is adjusted with aratio
ranging from 0 to 1.
python test_real_data.py \
--test_data_root 'PATH/TO/REAL_DATA' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--src_view 0 1 \
--ratio=0.5
- Freeview rendering: run the following code to interpolate freeview between source views, and modify the
novel_view_nums
to set a specific number of novel viewpoints.
python test_view_interp.py \
--test_data_root 'PATH/TO/RENDER_DATA/val' \
--ckpt_path 'PATH/TO/GPS-GS_stage2_final.pth' \
--novel_view_nums 5
If you find this code useful for your research, please consider citing:
@inproceedings{zheng2024gpsgaussian,
title={GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis},
author={Zheng, Shunyuan and Zhou, Boyao and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}