Run demo error
Opened this issue · 8 comments
Hi,
Thanks for your great project,
My environment is torch 1.1.0 and neural-renderer-pytorch 1.1.3,
I call library to check torch GPU is work and return" torch.cuda.is_available()" is True.
When I run demo, I have only one gpu, and change the experiments/v100_test.sh as
python -u test_multipose.py
--names rs_model
--dataset example
--list_start 0
--list_end 10
--dataset_mode allface
--gpu_ids 0
--netG rotatespade
--norm_G spectralsyncbatch
--model rotatespade
--label_nc 5
--nThreads 1
--heatmap_size 2.5
--chunk_size 1
--no_gaussian_landmark
--multi_gpu
--device_count 1
--render_thread 1
--label_mask
--align
--erode_kernel 21
--yaw_poses 0 30
and revise test_multipose.py#L102 as opt.gpu_ids = [0]
But still got the error as following:
----------------- Options ---------------
align: True [default: False]
aspect_ratio: 1.0
cache_filelist_read: False
cache_filelist_write: False
checkpoints_dir: ./checkpoints
chunk_size: [1] [default: None]
contain_dontcare_label: False
crop_size: 256
dataset: example [default: ms1m,casia]
dataset_mode: allface
device_count: 1 [default: 8]
display_winsize: 256
erode_kernel: 21
gpu_ids: 0
heatmap_size: 2.5 [default: 3]
how_many: inf
init_type: xavier
init_variance: 0.02
isTrain: False [default: None]
label_mask: True [default: False]
label_nc: 5
landmark_align: False
list_end: 10 [default: inf]
list_num: 0
list_start: 0
load_from_opt_file: False
load_size: 256
max_dataset_size: 9223372036854775807
model: rotatespade [default: rotate]
multi_gpu: True [default: False]
nThreads: 1
name: mesh2face
names: rs_model [default: rs_ijba3]
nef: 16
netG: rotatespade [default: rotate]
ngf: 64
no_flip: True
no_gaussian_landmark: True [default: False]
no_instance: True
no_pairing_check: False
norm_D: spectralinstance
norm_E: spectralinstance
norm_G: spectralsyncbatch [default: spectralinstance]
output_nc: 3
phase: test
pitch_poses: None
posesrandom: False
preprocess_mode: scale_width_and_crop
render_thread: 1 [default: 2]
resnet_initial_kernel_size: 7
resnet_kernel_size: 3
resnet_n_blocks: 9
resnet_n_downsample: 4
results_dir: ./results/
save_path: ./results/
serial_batches: True
trainer: rotate
which_epoch: latest
yaw_poses: [0.0, 30.0] [default: None]
----------------- End -------------------
dataset [AllFaceDataset] of size 8 was created
render_gpu_ids [0]
Testing gpu [0]
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
start prefetching data...
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
(************* each image render time: 8.694 *****************)
/home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:117
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: driver shutting down (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:556)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fb0e533f441 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fb0e533ed7a in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x1390c (0x7fb091a7990c in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: torch::CudaIPCSentData::~CudaIPCSentData() + 0x215 (0x7fb0e5674115 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x11e288 (0x7fb0e5676288 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x430f1 (0x7fb0ea2190f1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: + 0x431ea (0x7fb0ea2191ea in /lib/x86_64-linux-gnu/libc.so.6)
frame #7: + 0x20fad9 (0x55e62e3ebad9 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #8: + 0x20fbb8 (0x55e62e3ebbb8 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #9: PyErr_PrintEx + 0x32 (0x55e62e3ebc22 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #10: PyRun_SimpleStringFlags + 0x66 (0x55e62e3f1f96 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #11: Py_Main + 0x423 (0x55e62e3f5d73 in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #12: main + 0xee (0x55e62e2bff2e in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
frame #13: __libc_start_main + 0xe7 (0x7fb0ea1f7b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: + 0x1c327f (0x55e62e39f27f in /home/infor/anaconda3/envs/python36_RotateRender_torch1.4/bin/python)
in the end there are no result images saved into that folder.
can you give me some suggestions? Thank you so much.
Hello, I encountered the same problem as you, did you solve it? And can you predict your input graph?Thanks very much!
Hello, I encountered the same problem as you, did you solve it? And can you predict your input graph?Thanks very much!
Same issue, any workaround? Also, how to test on custom image?
I am also facing the same issue. No results are being saved in the results folder. Could you please look into this issue as soon as possible. Thank you
+1
+1
Hi, your Pytorch version is 1.0.0. I got the same problem at first, But I solved it after I changed the version to 1.2.0, I hope this can help you.
Have this problem too, is there anyone have solved this?
dataset [AllFaceDataset] of size 8 was created
Testing gpu [0]
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
Traceback (most recent call last):
File "", line 1, in
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "/home/jac/anaconda3/envs/py3_face_front/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 110, in rebuild_cuda_tensor
event_sync_required)
RuntimeError: CUDA error: out of memory
^CInterrupted!
@Masakaa what's the memory size of your gpu card? thx