Question about incompatible shapes(0,3) and (100,3) at II. Joint Optimization in Training, Validation, and Testing
Osavalon opened this issue · 19 comments
Hi, thank you for the inspiring work and your open source code!When I run the following script, I get an ValueError report the at step II. Joint Optimization in Training, Validation, and Testing:
I. Shape Pre-Training and II. Joint Optimization (training and validation)
'''
scene='hotdog_2163'
gpus='2'
model='nerfactor'
overwrite='True'
proj_root='/lyy/nerfactor'
repo_dir="$proj_root/nerfactor"
viewer_prefix='' # or just use ''
I. Shape Pre-Training
data_root="$proj_root/data/selected/$scene"
if [[ "$scene" == scan* ]]; then
# DTU scenes
imh='256'
else
imh='512'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
near='0.1'; far='2'
else
near='2'; far='6'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
use_nerf_alpha='True'
else
use_nerf_alpha='False'
fi
surf_root="$proj_root/output/surf/$scene"
shape_outdir="$proj_root/output/train/${scene}_shape"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='shape.ini' --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,outroot=$shape_outdir,viewer_prefix=$viewer_prefix,overwrite=$overwrite"
II. Joint Optimization (training and validation)
shape_ckpt="$shape_outdir/lr1e-2/checkpoints/ckpt-2"
brdf_ckpt="$proj_root/output/train/merl/lr1e-2/checkpoints/ckpt-50"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
xyz_jitter_std=0.001
else
xyz_jitter_std=0.01
fi
test_envmap_dir="$proj_root/data/envmaps/for-render_h16/test"
shape_mode='finetune'
outroot="$proj_root/output/train/${scene}_$model"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config="$model.ini" --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,shape_model_ckpt=$shape_ckpt,brdf_model_ckpt=$brdf_ckpt,xyz_jitter_std=$xyz_jitter_std,test_envmap_dir=$test_envmap_dir,shape_mode=$shape_mode,outroot=$outroot,viewer_prefix=$viewer_prefix,overwrite=$overwrite"
III. Simultaneous Relighting and View Synthesis (testing)
ckpt="$outroot/lr5e-3/checkpoints/ckpt-10"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
color_correct_albedo='false'
else
color_correct_albedo='true'
fi
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"
'''
[trainvali] For results, see:
/lyy/nerfactor/output/train/hotdog_2163_nerfactor/lr5e-3
[datasets/nerf_shape] Number of 'train' views: 100
[datasets/nerf_shape] Number of 'vali' views: 8
[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
Traceback (most recent call last):
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 341, in
app.run(main)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 106, in main
model = Model(config, debug=FLAGS.debug)
File "/lyy/nerfactor/nerfactor/nerfactor/models/nerfactor.py", line 68, in init
ioutil.restore_model(self.brdf_model, brdf_ckpt)
File "/lyy/nerfactor/nerfactor/nerfactor/util/io.py", line 48, in restore_model
ckpt.restore(ckpt_path).expect_partial()
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore
status = self._saver.restore(save_path=save_path)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1304, in restore
checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 907, in _restore_from_checkpoint_position
tensor_saveables, python_saveables))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 289, in restore_saveables
validated_saveables).restore(self.save_path_tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore
restore_ops.update(saver.restore(file_prefix))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 103, in restore
restored_tensors, restored_shapes=None)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in restore
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 392, in _assign_on_device
return variable.assign(tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 846, in assign
self._shape.assert_is_compatible_with(value_tensor.shape)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 1117, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (0, 3) and (100, 3) are incompatible
The shape checkpoints are generated by step I. Shape Pre-Training and the BRDF checkpoints are downloaded from your page.
Does it mean i need to pre-train brdf model by myself?
Very much looking forward to your help!
I met the same problem. Is there any solution now?
hi! I met the same problem. Is there any solution now?
hi! I met the same problem. Is there any solution now?
I just trained the neural BRDF by myself and it solved the problem.
Can I add your VX? Thanks! Is it to download the BRDF data set to train the MERL_512 model?
just follow the instructions provided by the author. It is pretty simple.
follow the section of Preparation step1 in html:https://github.com/google/nerfactor/tree/main/nerfactor
my email:wangmingyang4@myhexin.com
Can we communicate privately? thanks!
follow the section of Preparation step1 in html:https://github.com/google/nerfactor/tree/main/nerfactor my email:wangmingyang4@myhexin.com Can we communicate privately? thanks!
Send an email to me via haian@zju.edu.cn if you meet any problem. But I prefer to discuss this here since it may help others, too.
@Haian-Jin Thanks for reporting a solution! I wonder if you found the pretrained BRDF MLP problematic? Any info useful for me in debugging this is appreciated.
@Haian-Jin Thanks for reporting a solution! I wonder if you found the pretrained BRDF MLP problematic? Any info useful for me in debugging this is appreciated.
I don‘t know why it happens. I sent the checkpoint that was trained by myself to @wangmingyang4 , and he said he would still meet the same problem. This is prettry strange.
Hi, I meet the same problem either
@Haian-Jin @wangmingyang4 @xiumingzhang
I think I find the problem.
It is caused by https://github.com/google/nerfactor/blob/main/nerfactor/models/brdf.py#L44. If you donnot have merl dataset locally and donnot modify 'data_root' at merl_512/lr1e-2.ini, the terrible thing happens...
One more thing is that I didn't find where to get 'brdf_merl_npz'.... It seems to be different from official merl dataset... What do I miss...
@Haian-Jin @wangmingyang4 @xiumingzhang I think I find the problem. It is caused by https://github.com/google/nerfactor/blob/main/nerfactor/models/brdf.py#L44. If you donnot have merl dataset locally and donnot modify 'data_root' at merl_512/lr1e-2.ini, the terrible thing happens... One more thing is that I didn't find where to get 'brdf_merl_npz'.... It seems to be different from official merl dataset... What do I miss...
haha this is what I miss: https://github.com/google/nerfactor/tree/main/data_gen#converting-the-merl-binary-brdfs-into-a-tensorflow-dataset
I'm sorry to reply you so late. @Woolseyyy @Haian-Jin @xiumingzhang
Thanks everyone!
Here, I will summarize how to use the merl_512 trained model provided by the author.
- You can use the trinvali_run.sh to train BRDF model. When the lr1e-2.ini file is generated, the training can be canceled.
- You should copy the generated lr1e-2.ini file to the downloaded merl_512 folder and replace the original file.
@wangmingyang4 Thanks for reporting a potential solution. But hmmm, why does this hack work? I'm trying to understand why this works, and how I can make changes to eliminate the need for such a hack.
@wangmingyang4 Thanks for reporting a potential solution. But hmmm, why does this hack work? I'm trying to understand why this works, and how I can make changes to eliminate the need for such a hack.
the reason is presented here: #24 (comment)
I think changing the corresponding code and adding a name list file of merl would help.
@xiumingzhang
Can I set the imh to the original image size for training?
I solved this problem after setting the envmaps path.
@xiumingzhang I have met a new problem when I run ##2. Compute geometry buffers for all views by querying the trained NeRF
Views (train): 0%| | 0/97 [00:00<?, ?it/s]2022-11-21 23:17:13.473628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
arr_norm = (arr - arr.min()) / (arr.max() - arr.min())
I1121 23:22:47.950219 140528241899328 animation.py:1118] Animation.save using <class 'matplotlib.animation.FFMpegWriter'>
I1121 23:22:47.951245 140528241899328 animation.py:326] figure size in inches has been adjusted from 7.104166666666667 x 5.333333333333333 to 7.1 x 5.32
I1121 23:22:47.951492 140528241899328 animation.py:346] MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 710x532 -pix_fmt rgba -r 12 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /lyy/nerfactor/nerfactor_b/output/surf/pinecone512rays1024/train_000/lvis.mp4
Views (train): 1%|▎ | 1/97 [06:24<10:15:51, 384.91s/it]/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
arr_norm = (arr - arr.min()) / (arr.max() - arr.min())
I1121 23:28:57.483244 140528241899328 animation.py:1118] Animation.save using <class 'matplotlib.animation.FFMpegWriter'>
I1121 23:28:57.484086 140528241899328 animation.py:326] figure size in inches has been adjusted from 7.104166666666667 x 5.333333333333333 to 7.1 x 5.32
I1121 23:28:57.484405 140528241899328 animation.py:346] MovieWriter._run: running command: ffmpeg -f rawvideo -vcodec rawvideo -s 710x532 -pix_fmt rgba -r 12 -loglevel error -i pipe: -vcodec h264 -pix_fmt yuv420p -y /lyy/nerfactor/nerfactor_b/output/surf/pinecone512rays1024/train_001/lvis.mp4
Views (train): 2%|▋ | 2/97 [12:25<9:46:54, 370.68s/it]/lyy/nerfactor/nerfactor_b/nerfactor/util/geom.py:58: RuntimeWarning: invalid value encountered in true_divide
arr_norm = (arr - arr.min()) / (arr.max() - arr.min())
I’d appreciate some help.My script is:
##1. Train a vanilla NeRF, optionally using multiple GPUs:
scene='pinecone'
gpus='3'
proj_root='/lyy/nerfactor/nerfactor_b'
repo_dir="$proj_root/nerfactor"
viewer_prefix=''
data_root="/lyy/nerfactor/data/nerf_real_360_proc/$scene"
near='0.1'
far='2'
lr='5e-4'
imh='512'
n_rays_per_step='1024'
outroot="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}"
REPO_DIR="$proj_root" "$proj_root/nerfactor/trainvali_run.sh" "$gpus" --config='nerf.ini' --config_override="n_rays_per_step=$n_rays_per_step,data_root=$data_root,imh=$imh,near=$near,far=$far,lr=$lr,outroot=$outroot,viewer_prefix=$viewer_prefix"
# Optionally, render the test trajectory with the trained NeRF, only can use 1 gpu
gpus='3'
scene='pinecone'
imh='512'
near='0.1'
far='2'
proj_root='/lyy/nerfactor/nerfactor_b'
n_rays_per_step='1024'
outroot="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}"
lr='5e-4'
ckpt="$outroot/lr$lr/checkpoints/ckpt-2"
REPO_DIR="$proj_root" "$proj_root/nerfactor/nerf_test_run.sh" "$gpus" --ckpt="$ckpt"
## Check the quality of this NeRF geometry by inspecting the visualization HTML for the alpha and normal maps. You might
## need to re-run this with anothe r learning rate if the estimated NeRF geometry is too off.
##2. Compute geometry buffers for all views by querying the trained NeRF: (single GPU)
scene='pinecone'
gpus='3'
proj_root='/lyy/nerfactor/nerfactor_b'
repo_dir="$proj_root/nerfactor"
viewer_prefix=''
data_root="/lyy/nerfactor/data/nerf_real_360_proc/$scene"
imh='512'
lr='5e-4'
near='0.1'
far='2'
n_rays_per_step='1024'
trained_nerf="$proj_root/output/train/${scene}_nerf${imh}rays${n_rays_per_step}n${near}f${far}/lr${lr}"
occu_thres='0.5'
if [[ "$scene" == pinecone* || "$scene" == scan* ]]; then
# pinecone and DTU scenes
scene_bbox='-0.3,0.3,-0.3,0.3,-0.3,0.3'
elif [[ "$scene" == vasedeck* ]]; then
scene_bbox='-0.2,0.2,-0.4,0.4,-0.5,0.5'
else
# We don't need to bound the synthetic scenes
scene_bbox=''
fi
out_root="$proj_root/output/surf/$scene${imh}rays${n_rays_per_step}"
##bump this up until GPU gets OOM for faster computation
mlp_chunk='375000'
REPO_DIR="$proj_root" "$proj_root/nerfactor/geometry_from_nerf_run.sh" "$gpus" --data_root="$data_root" --trained_nerf="$trained_nerf" --out_root="$out_root" --imh="$imh" --scene_bbox="$scene_bbox" --occu_thres="$occu_thres" --mlp_chunk="$mlp_chunk"
@wangmingyang4 Yes, I think so.
@Osavalon Looks like your arr
is all-zero. You can try looking into how that happened? Sounds like this is another issue, so let me close this one. Please feel free to reopen this (or create a new one if this is a separate issue).