failing inside meshrender library, gpu related

Question

failing inside meshrender library, gpu related

bmauchly opened this issue 6 years ago · 4 comments

I guess the meshrender library source is not part of the project.

I'm running in a Colab notebook, with versions backed out to torch 0.4 and cuda 9.0
I'm getting this message:
torch.FatalError: abort... at /home/chenhsuan/adobe-scenemeshing/meshrender-forwardonly/src/gpu.c:33

command:
python3 main.py --load={model} --code=5e-2 --scale=2e-2 --lr-pmo=3e-3 --noise=0.1 --video --eval

full output:

/root/photometric-mesh-optim

main.py (photometric mesh optimization)

setting configurations...
H : 224
W : 224
aug_transl : None
avg_frame : False
batch_size : 32
batch_size_pmo : -1
category : None
code : 0.05
cpu : False
device : cuda:0
eval : True
from_epoch : 0
gpu : 0
group : 0
imagenet_enc : False
init_idx : 27
load : /root/photometric-mesh-optim/pretrained/02958343_atl25.npz
log_tb : False
log_visdom : False
lr_decay : 1.0
lr_pmo : 0.003
lr_pretrain : 0.0001
lr_step : 100
name : debug_seed0
noise : 0.1
num_meshgrid : 5
num_points : 100
num_points_all : 2500
num_prim : 25
num_workers : 8
pointcloud_path : data/customShapeNet
pretrained_dec : None
rendering_path : data/rendering
scale : 0.02
seed : 0
seq_path : data/sequences
sfm : False
size : 224x224
sphere : False
sphere_densify : 3
sun360_path : data/background
to_epoch : 500
to_it : 100
video : True
vis_port : 8097
vis_server : http://localhost

reading list of sequences...
number of sequences: 1
building AtlasNet...
loading checkpoint /root/photometric-mesh-optim/pretrained/02958343_atl25.npz...
======= OPTIMIZATION START =======
loading sequence...
reading RGB .npy file...
reading ground-truth camera .npz file...
noise -- scale: -0.1378, rot: [-0.0205,0.0343,-0.0529], trans: [0.0852,0.0939,-0.1320]
error: invalid device function
Traceback (most recent call last):
File "main.py", line 31, in
pmo.optimize(opt)
File "/root/photometric-mesh-optim/model.py", line 185, in optimize
loss = self.compute_loss(opt,var)
File "/root/photometric-mesh-optim/model.py", line 75, in compute_loss
loss.photom += self.compute_photometric_loss_batch(opt,var,idx_a,idx_b)
File "/root/photometric-mesh-optim/model.py", line 96, in compute_photometric_loss_batch
index_a,,,_,vertices_a = render.rasterize_3D_mesh(opt,var.vertices_clone,self.faces,cam_extr_a,cam_intr)
File "/root/photometric-mesh-optim/render.py", line 17, in rasterize_3D_mesh
index_map,baryc_map,mask_map,inv_depth_map = Rasterize().apply(opt,B,cam_intr,face_vertices,batch_face_index)
File "/root/photometric-mesh-optim/render.py", line 76, in forward
meshrender.forward_cuda(cam_intr,face_vertices_trans,batch_face_index,index_map,baryc_map,inv_depth_map,lock_map)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/init.py", line 197, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: abort... at /home/chenhsuan/adobe-scenemeshing/meshrender-forwardonly/src/gpu.c:33

Answer 1 · 2019-04-06T03:03:29.000Z

The meshrender library source is part of the project. It's currently compiled with compute compatibility 6.1 due to some of the atomic operations that's required, so it might not be able to run smoothly if you have older GPUs. I'm working on optimizing the cuda code so it could compiled and run with lower compute compatibilities, when it's done I'll also release this part of the source code.
Regarding the specific reported error, however, I don't have a very good idea of what might be the cause, since it's happening on a return statement. I'll leave this issue open for now.

Answer 2 · 2019-04-06T17:31:46.000Z

I've just released the newly compiled meshrender library as well as the C source code. If your issue still persists with this version, please let me know!

Answer 3 · 2019-04-08T14:55:48.000Z

Thank you, it is now running through the entire eval without error, with the new library. I didn't need to build it. (and yes, with an older GPU.)

Answer 4 · 2019-04-08T14:56:22.000Z

The new version of the meshrender library fixes this.