DeepRec-AI/DeepRec

Error when running DIN using --multihash=True

treper opened this issue · 0 comments

treper commented

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.04):
  • DeepRec version or commit id:deeprec-release:deeprec2304-gpu-py38-cu116-ubuntu20.04
  • Python version:3.8
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:11

Describe the current behavior

when run din example: LD_PRELOAD=./libjemalloc.so.2.5.1 python3 train.py --data_location ./dataset/amz_book --multihash true

2023-07-14 05:28:47 UTC -- Traceback (most recent call last):

2023-07-14 05:28:47 UTC --   File "/code/rank/deeprec/train.py", line 982, in <module>

2023-07-14 05:28:47 UTC --     main()

2023-07-14 05:28:47 UTC --   File "/code/rank/deeprec/train.py", line 733, in main

2023-07-14 05:28:47 UTC --     model = DIN(feature_column=feature_column,

2023-07-14 05:28:47 UTC --   File "/code/rank/deeprec/train.py", line 107, in __init__

2023-07-14 05:28:47 UTC --     self._create_model()

2023-07-14 05:28:47 UTC --   File "/code/rank/deeprec/train.py", line 356, in _create_model

2023-07-14 05:28:47 UTC --     uid_emb, item_emb, his_item_emb, sequence_length = self._embedding_input_layer(

2023-07-14 05:28:47 UTC --   File "/code/rank/deeprec/train.py", line 291, in _embedding_input_layer

2023-07-14 05:28:47 UTC --     item_embedding_var = tf.get_multihash_variable(

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 2346, in get_multihash_variable

2023-07-14 05:28:47 UTC --     val_Q = get_variable_scope().get_variable(

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 1509, in get_variable

2023-07-14 05:28:47 UTC --     return var_store.get_variable(

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 786, in get_variable

2023-07-14 05:28:47 UTC --     return _true_getter(

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 678, in _true_getter

2023-07-14 05:28:47 UTC --     return self._get_partitioned_variable(

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 930, in _get_partitioned_variable

2023-07-14 05:28:47 UTC --     partitions = _call_partitioner(partitioner, shape, dtype)

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 3247, in _call_partitioner

2023-07-14 05:28:47 UTC --     slicing = partitioner(shape=shape, dtype=dtype)

2023-07-14 05:28:47 UTC --   File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/partitioned_variables.py", line 205, in _partitioner

2023-07-14 05:28:47 UTC --     if dtype.base_dtype == dtypes.string:

2023-07-14 05:28:47 UTC -- AttributeError: type object 'float' has no attribute 'base_dtype'

Describe the expected behavior

Code to reproduce the issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.