Official Pytorch implementation of the preprint paper "Remote Sensing Novel View Synthesis with Implicit Multiplane Representations", in arXiv:2205.08908.
Novel view synthesis of remote sensing scenes is of great significance for scene visualization, human-computer interaction, and various downstream applications. Despite the recent advances in computer graphics and photogrammetry technology, generating novel views is still challenging particularly for remote sensing images due to its high complexity, view sparsity and limited view-perspective variations. In this paper, we propose a novel remote sensing view synthesis method by leveraging the recent advances in implicit neural representations. Considering the overhead and far depth imaging of remote sensing images, we represent the 3D space by combining implicit multiplane images (MPI) representation and deep neural networks. The 3D scene is reconstructed under a self-supervised optimization paradigm through a differentiable multiplane renderer with multi-view input constraints. Images from any novel views thus can be freely rendered on the basis of the reconstructed model. As a by-product, the depth maps corresponding to the given viewpoint can be generated along with the rendering output. We refer to our method as Implicit Multiplane Images ImMPI. To further improve the view synthesis under sparse-view inputs, we explore the learning-based initialization of remote sensing 3D scenes and proposed a neural network based Prior extractor to accelerate the optimization process. In addition, we propose a new dataset for remote sensing novel view synthesis with multi-view real-world google earth images. Extensive experiments demonstrate the superiority of the ImMPI over previous state-of-the-art methods in terms of reconstruction accuracy, visual fidelity, and time efficiency. Ablation experiments also suggest the effectiveness of our methodology design. Visualization of some samples are shown here.
git clone git@github.com:wyc-Chang/ImMPI.git
cd ImMPI
pip install -r requirement.txt
- Download the LEVIR-NVS dataset, which contains 16 scenes. The whole dataset is organized as follows:
root_directory
├──view_split.txt
├──scene_000
├──scene_001
├── Images
│ ├── 000.png
│ ├── 001.png
│ └── ...
├── Cams
│ ├── 000.txt
│ ├── 001.txt
│ └── ...
└── Depths
│ ├── 000.tiff
│ ├── 001.tiff
│ └── ...
└── Track
├── 000.txt
├── 001.txt
└── ...
- Camera file in
Cams
directory stores the camera parameters, which includes intrinsic, extrinsic and depth range:
extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33
intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22
DEPTH_MIN DEPTH_MAX
-
Track
directory stores a camera track for rendering a video, containing 100 novel viewpoints. -
view_split.txt
declares the split of Train-View and Test-View for all scenes.
ViewNum # train view number
ID0 ID1 ID2 ... # index of Train-View ID for all scenes in dataset
ViewNum # train view number
ID0 ID1 ID2 ... # index of Train-View ID for all scenes in dataset
...
Depths
directory stores depth maps of cooresponding views, the.tiff
file can be read by:
depth = np.array(Image.open(filename)).astype(np.float32)
- Download the pretrained weights from Google Drive, and unzip into the repo directory. The checkpoints directory should be organized as follows:
checkpoints
├──ASI_prior.ckpt
├──optimization
├── scene_000_000
│ ├── optimizedImMPI.ckpt
├── scene_001_000
├── optimizedImMPI.ckpt
...
- Optimized models for each scene are stored in
optimization
directory.
- Render views from optimized ImMPI of each scene. Run the following command to render image from implicit multiplane representation. Make sure you have the right value for variable
dataset_dirpath
inrenderImages.py
. You can find the result inoutput/syntheticImage/Levir_NVS
directory.
python example_render_image.py
- Render video according to the camera track supplied. Run the following command to generate video in
output/track_video
forscene_000
:
python example_render_video.py –-config=configs/renderTrack/scene_000.txt
Our algorithm consists of 1) Cross-scene Initialization and 2) Per-scene Optimization. The former is learning-based method training based on remote sensing Multi-View Stereo dataset. We reorganize WHU MVS/Stereo dataset for prior extractor training. Train the model by running:
python train.py --config/configs/pretrain/whu_mvs.txt
The following we give the config file of how the parameters are defined.
# dataset parameters
dataset = whu_mvs
train_dataset_dirpath = dataset_dirpath
train_list_filepath = ./datasets/datalist/whu_mvs/train.txt
validate_dataset_dirpath = dataset_dirpath
validate_list_filepath = ./datasets/datalist/whu_mvs/val.txt
# training parameters
epochs = 500
learning_rate = 0.0001
lr_ds_epoch_idx = 100,200,300,400:2
batch_size = 1
num_workers = 8
logdir = "./checkpoints/ASI_training"
#resume = False
loadckpt = None
# log writer and random seed parameters
summary_scalars_freq = 100
summary_images_freq = 500
save_ckpt_freq = 50
validate_freq = 20
seed = 28
# model parameters
depth_sample_num = 32
feature_generator_model_type = "resnet18"
neighbor_view_num = 19
# loss weights
loss_rgb_weight = 2.0
loss_ssim_weight = 1.0
With pretrained Across Scene Initialization model, optimization for scene_000
is implemented by running following command. In addition, we supply a pretrained ASI-model in checkpoint/ASI_prior.ckpt
.
python optimize.py --config/configs/optimization/levir_nvs/scene_000.txt
-
Visualization of rendered novel view images for scenes in Levir-NVS. We supply the camera track for each scene in dataset.
-
Visualization of ImMPI layers. We visualized image, depth map and RGB and sigma values of MPI layers corresponding to novel view. From the figure, it can be seen that ground objects such as buildings and trees of different heights in the same scene appear in different MPI layers. As the depth increases (away from the camera), the content in the scene from the roof to the ground gradually emerges.
-
Optimization process of Scene
#Observation
. Rendered RGB image and depth corresponding to one of the Test-View are shown as follows. "w/o CSI" refers to optimization without Cross-scene Initialization.
If you use this code for your research, please cite our paper:
@inproceedings{wu2022immpi,
title={Remote Sensing Novel View Synthesis with Implicit Multiplane Representations},
author={Yongchang Wu},
year={2022},
journal={arXiv preprint arXiv:2205.08908},
}