skyflynil/stylegan2

tensorflow.python.framework.errors_impl.NotFoundError: /root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

Closed this issue · 1 comments

When I run $ python run_training.py --num-gpus=1 --data-dir=/data/ --config=config-f --dataset=dataset --mirror-augment=true --metric=none --total-kimg=20000 --result-dir="~/chenyulan/data/results", I get an error as follow:
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Loading... Failed!
Traceback (most recent call last):
File "run_training.py", line 230, in
main()
File "run_training.py", line 225, in main
run(**vars(args))
File "run_training.py", line 144, in run
dnnlib.submit_run(**kwargs)
File "/root/stylegan2_train/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/root/stylegan2_train/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/root/stylegan2_train/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/root/stylegan2_train/training/training_loop.py", line 179, in training_loop
G = tflib.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **G_args)
File "/root/stylegan2_train/dnnlib/tflib/network.py", line 97, in init
self._init_graph()
File "/root/stylegan2_train/dnnlib/tflib/network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "/root/stylegan2_train/training/networks_stylegan2.py", line 288, in G_main
components.synthesis = tflib.Network('G_synthesis', func_name=globals()[synthesis_func], **kwargs)
File "/root/stylegan2_train/dnnlib/tflib/network.py", line 97, in init
self._init_graph()
File "/root/stylegan2_train/dnnlib/tflib/network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "/root/stylegan2_train/training/networks_stylegan2.py", line 641, in G_synthesis_stylegan2
x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3)
File "/root/stylegan2_train/training/networks_stylegan2.py", line 565, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv)
File "/root/stylegan2_train/training/networks_stylegan2.py", line 100, in modulated_conv2d_layer
s = apply_bias_act(s, bias_var=mod_bias_var) + 1 # [BI] Add bias (initially 1).
File "/root/stylegan2_train/training/networks_stylegan2.py", line 69, in apply_bias_act
return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, alpha=alpha, gain=gain)
File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
cuda_kernel = _get_plugin().fused_bias_act
File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu')
File "/root/stylegan2_train/dnnlib/tflib/custom_ops.py", line 160, in get_plugin
plugin = tf.load_op_library(bin_file)
File "/root/anaconda3/envs/stylegan2/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

I try this way as fllow:
In file stylegan2/dnnlib/tflib/custom_ops.py, line 127:
change from
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\’’
to
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=1\’’
But ,It don't work

run : $ ldd root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so':
linux-vdso.so.1 => (0x00007ffd1adb2000)
_pywrap_tensorflow_internal.so => not found
librt.so.1 => /usr/lib64/librt.so.1 (0x00007eff848ff000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007eff846e3000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007eff844df000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007eff84e2d000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007eff841dd000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007eff83fc7000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007eff83c05000)
/lib64/ld-linux-x86-64.so.2 (0x00007eff84da4000)
environment:
python=3.6.12
tensorflow-gpu=1.14
cuda:10.0
cuDnn:7.6.5

I used "pip install tensorflow-gpu==1.14.0" to reinstall the tensorflow, then worked.