run_classifier.py Error COLA

Question

run_classifier.py Error COLA

zzj0402 opened this issue 5 years ago · 2 comments

Running the cola script returns:

2020-01-15 17:53:21.504699: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-01-15 17:53:21.505194: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-15 17:53:21.518577: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3599910000 Hz
2020-01-15 17:53:21.519665: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c2f130 executing computations on platform Host. Devices:
2020-01-15 17:53:21.519701: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
  File "run_classifer.py", line 457, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run_classifer.py", line 307, in main
    loss_multiplier=loss_multiplier)
  File "run_classifer.py", line 195, in get_model
    pooled_output, _ = albert_layer(input_word_ids, input_mask, input_type_ids)
  File "/root/ALBERT-TF2.0/albert.py", line 212, in __call__
    return super(AlbertModel, self).__call__(inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 842, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
RuntimeError: in converted code:

    /root/ALBERT-TF2.0/albert.py:229 call  *
        word_embeddings = self.embedding_lookup(input_word_ids)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:817 __call__
        self._maybe_build(inputs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:2141 _maybe_build
        self.build(input_shapes)
    /root/ALBERT-TF2.0/albert.py:273 build
        dtype=self.dtype)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:522 add_weight
        aggregation=aggregation)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/base.py:744 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:139 make_variable
        shape=variable_shape if variable_shape else None)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:258 __call__
        return cls._variable_v1_call(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:219 _variable_v1_call
        shape=shape)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:65 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py:1322 creator_with_resource_vars
        return self._create_variable(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/one_device_strategy.py:262 _create_variable
        return next_creator(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:197 <lambda>
        previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variable_scope.py:2507 default_variable_creator
        shape=shape)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:262 __call__
        return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1406 __init__
        distribute_strategy=distribute_strategy)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1537 _init_from_args
        initial_value() if init_from_fn else initial_value,
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:119 <lambda>
        init_val = lambda: initializer(shape, dtype=dtype)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/init_ops_v2.py:343 __call__
        self.stddev, dtype)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/init_ops_v2.py:809 truncated_normal
        shape=shape, mean=mean, stddev=stddev, dtype=dtype, seed=self.seed)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/random_ops.py:171 truncated_normal
        mean_tensor = ops.convert_to_tensor(mean, dtype=dtype, name="mean")
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1184 convert_to_tensor
        return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1242 convert_to_tensor_v2
        as_ref=False)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1296 internal_convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/tensor_conversion_registry.py:52 _default_conversion_function
        return constant_op.constant(value, dtype, name=name)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:227 constant
        allow_broadcast=True)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:235 _constant_impl
        t = convert_to_eager_tensor(value, ctx, dtype)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:96 convert_to_eager_tensor
        return ops.EagerTensor(value, ctx.device_name, dtype)

    RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Answer 1 · 2020-01-17T05:53:23.000Z

@zzj0402 can you help in finding the solution for this issue :
#32

Answer 2 · 2020-01-17T19:38:38.000Z

Are you are trying run on a GPU but you don't have one or it's not configured?
Please try using a Docker file docker pull tensorflow/tensorflow:latest-gpu-py3 to ensure your GPU is configured if you have one.