alibaba/EasyRec

Error when split the user and item model using split_model_pai.py

treper opened this issue · 2 comments

treper commented

I use the docker to run the split_model_pai.py script,
The docker image I use:easyrec:py36-tf1.15-0.6.3
The cmd I use: python split_model_pai.py --model_dir={{ model_dir }} --user_model_dir={{ user_model_dir }} --item_model_dir={{ item_model_dir}}
here is the error:

I0720 00:54:58.280419 140245594871616 split_model_pai.py:255] Exporting user part model...

2023-07-20 00:54:58 UTC -- WARNING:tensorflow:From /data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py:195: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- W0720 00:54:58.306650 140245594871616 module_wrapper.py:139] From /data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py:195: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- WARNING:tensorflow:From /data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py:196: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- W0720 00:54:58.306813 140245594871616 module_wrapper.py:139] From /data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py:196: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- 2023-07-20 00:54:58.307495: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

2023-07-20 00:54:58 UTC -- 2023-07-20 00:54:58.317565: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz

2023-07-20 00:54:58 UTC -- 2023-07-20 00:54:58.321292: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2d8b130 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2023-07-20 00:54:58 UTC -- 2023-07-20 00:54:58.321315: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

2023-07-20 00:54:58 UTC -- Traceback (most recent call last):

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal

2023-07-20 00:54:58 UTC --     graph._c_graph, serialized, options)  # pylint: disable=protected-access

2023-07-20 00:54:58 UTC -- tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'Tsegmentids' not in Op<name=SparseSegmentSum; signature=data:T, indices:Tidx, segment_ids:int32 -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_INT64, DT_BFLOAT16, DT_UINT16, DT_HALF, DT_UINT32, DT_UINT64]; attr=Tidx:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]>; NodeDef: {{node input_layer/age_level_embedding/age_level_embedding_weights/embedding_lookup_sparse}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- During handling of the above exception, another exception occurred:

2023-07-20 00:54:58 UTC -- 

2023-07-20 00:54:58 UTC -- Traceback (most recent call last):

2023-07-20 00:54:58 UTC --   File "/data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py", line 276, in <module>

2023-07-20 00:54:58 UTC --     tf.app.run()

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run

2023-07-20 00:54:58 UTC --     _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run

2023-07-20 00:54:58 UTC --     _run_main(main, args)

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main

2023-07-20 00:54:58 UTC --     sys.exit(main(argv))

2023-07-20 00:54:58 UTC --   File "/data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py", line 263, in main

2023-07-20 00:54:58 UTC --     part_dir=FLAGS.user_model_dir)

2023-07-20 00:54:58 UTC --   File "/data/nt/opensource/EasyRec/easy_rec/python/tools/split_model_pai.py", line 199, in export

2023-07-20 00:54:58 UTC --     importer.import_graph_def(inference_graph, name='')

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func

2023-07-20 00:54:58 UTC --     return func(*args, **kwargs)

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def

2023-07-20 00:54:58 UTC --     producer_op_list=producer_op_list)

2023-07-20 00:54:58 UTC --   File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py", line 505, in _import_graph_def_internal

2023-07-20 00:54:58 UTC --     raise ValueError(str(e))

2023-07-20 00:54:58 UTC -- ValueError: NodeDef mentions attr 'Tsegmentids' not in Op<name=SparseSegmentSum; signature=data:T, indices:Tidx, segment_ids:int32 -> output:T; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_INT64, DT_BFLOAT16, DT_UINT16, DT_HALF, DT_UINT32, DT_UINT64]; attr=Tidx:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]>; NodeDef: {{node input_layer/age_level_embedding/age_level_embedding_weights/embedding_lookup_sparse}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

@treper Could you provide the model to reproduce the failure?

treper commented

The error occurs when I use split script to split a model trained by tf2.x, I switched to tf1.x to train the model and it split successfully. Could you provide tf2.x split scripts? Thanks!