v100_test.py with wrong result

Question

v100_test.py with wrong result

Closed this issue 5 years ago · 6 comments

As i have only one GPU, i changed the v100_test.py, but result seems failed.
Below is the log, and the jpg files in result folder are wrong, all are black.

##############################################################
./experiments/v100_test.sh
----------------- Options ---------------
align: True [default: False]
aspect_ratio: 1.0
cache_filelist_read: False
cache_filelist_write: False
checkpoints_dir: ./checkpoints
chunk_size: [1] [default: None]
contain_dontcare_label: False
crop_size: 256
dataset: example [default: ms1m,casia]
dataset_mode: allface
device_count: 2 [default: 8]
display_winsize: 256
erode_kernel: 21
gpu_ids: 0,1 [default: 0]
heatmap_size: 2.5 [default: 3]
how_many: inf
init_type: xavier
init_variance: 0.02
isTrain: False [default: None]
label_mask: True [default: False]
label_nc: 5
landmark_align: False
list_end: 10 [default: inf]
list_num: 0
list_start: 0
load_from_opt_file: False
load_size: 256
max_dataset_size: 9223372036854775807
model: rotatespade [default: rotate]
multi_gpu: True [default: False]
nThreads: 1
name: mesh2face
names: rs_model [default: rs_ijba3]
nef: 16
netG: rotatespade [default: rotate]
ngf: 64
no_flip: True
no_gaussian_landmark: True [default: False]
no_instance: True
no_pairing_check: False
norm_D: spectralinstance
norm_E: spectralinstance
norm_G: spectralsyncbatch [default: spectralinstance]
output_nc: 3
phase: test
pitch_poses: None
posesrandom: False
preprocess_mode: scale_width_and_crop
render_thread: 1 [default: 2]
resnet_initial_kernel_size: 7
resnet_kernel_size: 3
resnet_n_blocks: 9
resnet_n_downsample: 4
results_dir: ./results/
save_path: ./results/
serial_batches: True
trainer: rotate
which_epoch: latest
yaw_poses: [0.0] [default: None]
----------------- End -------------------
dataset [AllFaceDataset] of size 8 was created
Testing gpu [0]
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
start prefetching data...
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
(************* each image render time: 13.314 ****)
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
/home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
process image..../results/rs_model/example/orig/yaw_0.0_Ann_Veneman_0010.jpg
processed num 1
( each image time total: 14.036 ****)
( each image render time: 0.001 ****)
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
Error in forward_face_index_map_1: invalid device function
Error in forward_face_index_map_2: invalid device function
Error in forward_texture_sampling: invalid device function
process image..../results/rs_model/example/orig/yaw_0.0_Benjamin_Netanyahu_0005.jpg
processed num 2
( each image time total: 0.170 ****)
( each image render time: 0.036 ****)
process image..../results/rs_model/example/orig/yaw_0.0_Hugo_Chavez_0033.jpg
processed num 3
( each image time total: 0.188 ****)
( each image render time: 0.074 *****************)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: driver shutting down (insert_events at /opt/conda/conda-bld/pytorch_1579022034529/work/c10/cuda/CUDACachingAllocator.cpp:756)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fbd76b71627 in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x1af78 (0x7fbd76db1f78 in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x1cbd1 (0x7fbd76db3bd1 in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #3: torch::CudaIPCSentData::~CudaIPCSentData() + 0x241 (0x7fbd776694b1 in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5346c5 (0x7fbd7766b6c5 in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x534b8d (0x7fbd7766bb8d in /home/forest/anaconda3/envs/python361/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x39ff8 (0x7fbd7b2eeff8 in /lib/x86_64-linux-gnu/libc.so.6)
frame #7: + 0x3a045 (0x7fbd7b2ef045 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: + 0x20fad9 (0x5592f0444ad9 in /home/forest/anaconda3/envs/python361/bin/python)
frame #9: + 0x20fbb8 (0x5592f0444bb8 in /home/forest/anaconda3/envs/python361/bin/python)
frame #10: PyErr_PrintEx + 0x32 (0x5592f0444c22 in /home/forest/anaconda3/envs/python361/bin/python)
frame #11: PyRun_SimpleStringFlags + 0x66 (0x5592f044af96 in /home/forest/anaconda3/envs/python361/bin/python)
frame #12: Py_Main + 0x423 (0x5592f044ed73 in /home/forest/anaconda3/envs/python361/bin/python)
frame #13: main + 0xee (0x5592f0318f2e in /home/forest/anaconda3/envs/python361/bin/python)
frame #14: __libc_start_main + 0xf0 (0x7fbd7b2d5830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #15: + 0x1c327f (0x5592f03f827f in /home/forest/anaconda3/envs/python361/bin/python)

process image..../results/rs_model/example/orig/yaw_0.0_Julianne_Moore_0012.jpg
processed num 4
(************* each image time total: 0.254 ****)
( each image render time: 0.042 ****)
process image..../results/rs_model/example/orig/yaw_0.0_Keanu_Reeves_0010.jpg
processed num 5
( each image time total: 0.187 ****)
( each image render time: 0.002 ****)
process image..../results/rs_model/example/orig/yaw_0.0_Norah_Jones_0003.jpg
processed num 6
( each image time total: 0.146 ****)
( each image render time: 0.001 ****)
process image..../results/rs_model/example/orig/yaw_0.0_Robin_Wright_Penn_0001.jpg
processed num 7
( each image time total: 0.152 ****)
( each image render time: 0.037 ****)
process image..../results/rs_model/example/orig/yaw_0.0_Vitali_Klitschko_0003.jpg
processed num 8
( each image time total: 0.200 *****************)
finished
#########################################################################

Answer 1 · 2020-06-29T07:25:46.000Z

I have solved the issue.
I have only one GPU, change "opt.gpu_ids = [0]" in test_multipose.py and new configure likes below:
python -u test_multipose.py
--names rs_model
--dataset example
--list_start 0
--list_end 10
--dataset_mode allface
--gpu_ids 0
--netG rotatespade
--norm_G spectralsyncbatch
--model rotatespade
--label_nc 5
--nThreads 1
--heatmap_size 2.5
--chunk_size 1
--no_gaussian_landmark
--multi_gpu
--device_count 1
--render_thread 1
--label_mask
--align
--erode_kernel 21
--yaw_poses 0 30 \

Answer 2 · 2020-07-06T12:35:24.000Z

According your method and I encounter new problem :
dataset [AllFaceDataset] of size 8 was created
Testing gpu []
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
Traceback (most recent call last):
File "test_multipose.py", line 128, in
output_device=opt.gpu_ids[0],
IndexError: list index out of range

how to solve? Thank you
opt.gpu_ids = [0]，how to change? where?

Answer 3 · 2020-07-22T08:37:16.000Z

According your method and I encounter new problem :
dataset [AllFaceDataset] of size 8 was created
Testing gpu []
Network [RotateSPADEGenerator] was created. Total number of parameters: 225.1 million. To see the architecture, do print(network).
Traceback (most recent call last):
File "test_multipose.py", line 128, in
output_device=opt.gpu_ids[0],
IndexError: list index out of range

how to solve? Thank you
opt.gpu_ids = [0]，how to change? where?

Hi, I meet the same problem, have you solved it??

Answer 4 · 2020-11-21T03:41:33.000Z

Considering #7,
Single gpu is not enough to run this codes, right?

Answer 5 · 2021-05-17T19:08:46.000Z

@ForestLee method works for me

Answer 6 · 2021-10-20T03:05:05.000Z

File "test_multipose.py", line 128, in

in file test_multipose.py

change

opt.gpu_ids = list(range(0, ngpus - opt.render_thread))

to

if ngpus > 1:
    opt.gpu_ids = list(range(0, ngpus - opt.render_thread))
else:
    opt.gpu_ids = [0]