Error when running DIN using --multihash=True
treper opened this issue · 0 comments
treper commented
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 20.04):
- DeepRec version or commit id:deeprec-release:deeprec2304-gpu-py38-cu116-ubuntu20.04
- Python version:3.8
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:11
Describe the current behavior
when run din example: LD_PRELOAD=./libjemalloc.so.2.5.1 python3 train.py --data_location ./dataset/amz_book --multihash true
2023-07-14 05:28:47 UTC -- Traceback (most recent call last):
2023-07-14 05:28:47 UTC -- File "/code/rank/deeprec/train.py", line 982, in <module>
2023-07-14 05:28:47 UTC -- main()
2023-07-14 05:28:47 UTC -- File "/code/rank/deeprec/train.py", line 733, in main
2023-07-14 05:28:47 UTC -- model = DIN(feature_column=feature_column,
2023-07-14 05:28:47 UTC -- File "/code/rank/deeprec/train.py", line 107, in __init__
2023-07-14 05:28:47 UTC -- self._create_model()
2023-07-14 05:28:47 UTC -- File "/code/rank/deeprec/train.py", line 356, in _create_model
2023-07-14 05:28:47 UTC -- uid_emb, item_emb, his_item_emb, sequence_length = self._embedding_input_layer(
2023-07-14 05:28:47 UTC -- File "/code/rank/deeprec/train.py", line 291, in _embedding_input_layer
2023-07-14 05:28:47 UTC -- item_embedding_var = tf.get_multihash_variable(
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 2346, in get_multihash_variable
2023-07-14 05:28:47 UTC -- val_Q = get_variable_scope().get_variable(
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 1509, in get_variable
2023-07-14 05:28:47 UTC -- return var_store.get_variable(
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 786, in get_variable
2023-07-14 05:28:47 UTC -- return _true_getter(
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 678, in _true_getter
2023-07-14 05:28:47 UTC -- return self._get_partitioned_variable(
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 930, in _get_partitioned_variable
2023-07-14 05:28:47 UTC -- partitions = _call_partitioner(partitioner, shape, dtype)
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 3247, in _call_partitioner
2023-07-14 05:28:47 UTC -- slicing = partitioner(shape=shape, dtype=dtype)
2023-07-14 05:28:47 UTC -- File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/partitioned_variables.py", line 205, in _partitioner
2023-07-14 05:28:47 UTC -- if dtype.base_dtype == dtypes.string:
2023-07-14 05:28:47 UTC -- AttributeError: type object 'float' has no attribute 'base_dtype'
Describe the expected behavior
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.