Hangz-nju-cuhk/Rotate-and-Render

Run demo error

Opened this issue · 8 comments

Hi,
Thanks for your great project,
My environment is torch 1.1.0 and neural-renderer-pytorch 1.1.3,
I call library to check torch GPU is work and return" torch.cuda.is_available()" is True.
When I run demo, I have only one gpu, and change the experiments/v100_test.sh as
python -u test_multipose.py
--names rs_model
--dataset example
--list_start 0
--list_end 10
--dataset_mode allface
--gpu_ids 0
--netG rotatespade
--norm_G spectralsyncbatch
--model rotatespade
--label_nc 5
--nThreads 1
--heatmap_size 2.5
--chunk_size 1
--no_gaussian_landmark
--multi_gpu
--device_count 1
--render_thread 1
--label_mask
--align
--erode_kernel 21
--yaw_poses 0 30
and revise test_multipose.py#L102 as opt.gpu_ids = [0]
But still got the error as following:

----------------- Options ---------------
align: True [default: False]
aspect_ratio: 1.0
cache_filelist_read: False
cache_filelist_write: False
checkpoints_dir: ./checkpoints
chunk_size: [1] [default: None]
contain_dontcare_label: False
crop_size: 256
dataset: example [default: ms1m,casia]
dataset_mode: allface
device_count: 1 [default: 8]
display_winsize: 256
erode_kernel: 21
gpu_ids: 0
heatmap_size: 2.5 [default: 3]
how_many: inf
init_type: xavier
init_variance: 0.02
isTrain: False [default: None]
label_mask: True [default: False]
label_nc: 5
landmark_align: False
list_end: 10 [default: inf]
list_num: 0
list_start: 0
load_from_opt_file: False
load_size: 256
max_dataset_size: 9223372036854775807
model: rotatespade [default: rotate]
multi_gpu: True [default: False]
nThreads: 1
name: mesh2face
names: rs_model [default: rs_ijba3]
nef: 16
netG: rotatespade [default: rotate]
ngf: 64
no_flip: True
no_gaussian_landmark: True [default: False]
no_instance: True
no_pairing_check: False
norm_D: spectralinstance
norm_E: spectralinstance
norm_G: spectralsyncbatch [default: spectralinstance]
output_nc: 3
phase: test
pitch_poses: None
posesrandom: False
preprocess_mode: scale_width_and_crop
render_thread: 1 [default: 2]
resnet_initial_kernel_size: 7
resnet_kernel_size: 3
resnet_n_blocks: 9
resnet_n_downsample: 4
results_dir: ./results/
save_path: ./results/
serial_batches: True
trainer: rotate
which_epoch: latest
yaw_poses: [0.0, 30.0] [default: None]
----------------- End -------------------
dataset [AllFaceDataset] of size 8 was created
render_gpu_ids [0]
Testing gpu [0]
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
start prefetching data...
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
(************* each image render time: 8.694 *****************)
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:117
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: driver shutting down (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:556)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fb0e533f441 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fb0e533ed7a in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x1390c (0x7fb091a7990c in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: torch::CudaIPCSentData::~CudaIPCSentData() + 0x215 (0x7fb0e5674115 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x11e288 (0x7fb0e5676288 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x430f1 (0x7fb0ea2190f1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: + 0x431ea (0x7fb0ea2191ea in /lib/x86_64-linux-gnu/libc.so.6)
frame #7: + 0x20fad9 (0x55e62e3ebad9 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #8: + 0x20fbb8 (0x55e62e3ebbb8 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #9: PyErr_PrintEx + 0x32 (0x55e62e3ebc22 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #10: PyRun_SimpleStringFlags + 0x66 (0x55e62e3f1f96 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #11: Py_Main + 0x423 (0x55e62e3f5d73 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #12: main + 0xee (0x55e62e2bff2e in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #13: __libc_start_main + 0xe7 (0x7fb0ea1f7b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: + 0x1c327f (0x55e62e39f27f in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)

in the end there are no result images saved into that folder.
can you give me some suggestions? Thank you so much.

Hello, I encountered the same problem as you, did you solve it? And can you predict your input graph?Thanks very much!

Hello, I encountered the same problem as you, did you solve it? And can you predict your input graph?Thanks very much!

Same issue, any workaround? Also, how to test on custom image?

97jay commented

I am also facing the same issue. No results are being saved in the results folder. Could you please look into this issue as soon as possible. Thank you

+1

Hi, your Pytorch version is 1.0.0. I got the same problem at first, But I solved it after I changed the version to 1.2.0, I hope this can help you.

Have this problem too, is there anyone have solved this?

dataset [AllFaceDataset] of size 8 was created
Testing gpu [0]
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
Traceback (most recent call last):
File "", line 1, in
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 110, in rebuild_cuda_tensor
event_sync_required)
RuntimeError: CUDA error: out of memory
^CInterrupted!

@Masakaa what's the memory size of your gpu card? thx