Pretraining AtlasNet Fails in Chamfer Library
freshche opened this issue · 3 comments
Hi,
I followed the instructions to retrain AtlasNet with your new dataset (ShapeNet rendering + SUN360 backgrounds) however it seems to fail in the Chamfer library.
I use the instructions for setup as outlined in the README. My cuda version is 9.0
Here is the log.
=======================================================
main_pretrain.py (pretraining with AtlasNet reimplementation)
setting configurations...
H : 224
W : 224
aug_transl : None
avg_frame : False
batch_size : 32
batch_size_pmo : -1
category : 02691156
code : None
cpu : False
device : cuda:0
eval : False
from_epoch : 0
gpu : 0
group : 0
imagenet_enc : True
init_idx : 27
load : None
log_tb : False
log_visdom : False
lr_decay : 1.0
lr_pmo : 0.001
lr_pretrain : 0.0001
lr_step : 100
name : 02691156_pretrain_seed0
noise : None
num_meshgrid : 5
num_points : 100
num_points_all : 2500
num_prim : 25
num_workers : 8
pointcloud_path : data/customShapeNet
pretrained_dec : pretrained/ae_atlasnet_25.pth
rendering_path : data/rendering
scale : None
seed : 0
seq_path : data/sequences
sfm : False
size : 224x224
sphere : False
sphere_densify : 3
sun360_path : data/background
to_epoch : 500
to_it : 100
video : False
vis_port : 8097
vis_server : http://localhost
loading training data...
number of samples: 3235
loading test data...
number of samples: 809
building AtlasNet...
loading pretrained encoder...
loading pretrained decoder (pretrained/ae_atlasnet_25.pth)...
======= TRAINING START =======
error in nnd updateOutput: invalid device function
Traceback (most recent call last):
File "main_pretrain.py", line 26, in
trainer.train_epoch(opt,ep)
File "/task_runtime/photometric-mesh-optim/model_pretrain.py", line 71, in train_epoch
loss = self.compute_loss(opt,var,ep=ep)
File "/task_runtime/photometric-mesh-optim/model_pretrain.py", line 59, in compute_loss
dist1,dist2 = atlasnet.ChamferDistance().apply(opt,var.points_GT,var.points_pred)
File "/task_runtime/photometric-mesh-optim/atlasnet.py", line 211, in forward
chamfer.nnd_forward_cuda(p1,p2,dist1,dist2,idx1,idx2)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/init.py", line 197, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: aborting at /mnt/ilcompf6d1/user/chelin/adobe-scenemeshing/atlasnet-reimp/chamfer/src/my_lib_cuda.c:26
For now, I am able to get this running by building Chamfer from AtlasNet repository, and modifying the ChamferDistance class in atlasnet.py accordingly.
Would be nice to have the source files to build chamfer.so to avoid issues due to machine/version dependencies.
Thanks for reporting the issue. I'll leave this issue open for now.
The source files are now included in the repo.