tensorflow/tfx

TFX.components.transform id

raminmohammadi opened this issue · 4 comments

If the bug is related to a specific library below, please raise an issue in the
respective repo directly:

TensorFlow Data Validation Repo

TensorFlow Model Analysis Repo

TensorFlow Transform Repo

TensorFlow Serving Repo

System information

  • Have I specified the code to reproduce the issue (Yes, No): yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
    Interactive Notebook, Google Cloud, etc): Linux, Notebook, Colab
  • TensorFlow version: 2.13.0
  • TFX Version: 1.14.0
  • Python version: 3.8
  • Python dependencies (from pip freeze output):
    requirements.txt

Describe the current behavior:

this problem only happens when i use the transfrom as part of the tfx. I'm encountering an issue while working with the "transform" function, which involves processing individual input data items. Each of these data inputs consists of two keys: 'entities' and 'text'.

My specific task is to perform a transformation on the "text" dimension of the input tensor, breaking it down into individual characters. For example, given the input "This is a test," I intend to follow these steps:

Split the text into character arrays: [['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['t', 'e', 's', 't']]

Code 1: tf.strings.unicode_split(tf.strings.split('This is a test'), input_encoding='UTF-8')
Map each character to a dictionary, obtain its index, and pad each word to a width of 12 characters.

Code 2: tf.map_fn(get_index, text, fn_output_signature=tf.TensorSpec(shape=(1, Wlength), dtype=tf.int64, name=None))

currently transform only returns one vector starting with 1 and rest 0:
example = [[1, 0,0,0,0,0,0,0,0]]

Describe the expected behavior

expected output should be:

<tf.Tensor: shape=(4, 1, 12), dtype=int64, numpy=
array([[[58, 20, 21, 31, 0, 0, 0, 0, 0, 0, 0, 0]],

   [[21, 31,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[13,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[32, 17, 31, 32,  0,  0,  0,  0,  0,  0,  0,  0]]])>

Standalone code to reproduce the issue

Providing a bare minimum test case or step(s) to reproduce the problem will
greatly help us to debug the issue. If possible, please share a link to
Colab/Jupyter/any notebook.

https://colab.research.google.com/drive/1ap8Gycu7s--mz0VAxp4W2DphAd1HW1yi?usp=sharing

Name of your Organization (Optional)

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.

@raminmohammadi,

I am unable to run the shared notebook. My environment crashes while using tf.data.experimental.TFRecordWriter to write the TF Record file. Looking at the transform component, it should produce similar results within or outside TFX pipeline.

Can you please make sure the example notebook works so that we can replicate the issue on our end. Thank you!

not sure how to run this! I am able to run the jupyter on a local machine but on colab it fails at the moment. Will appreciate any feedback on this or if you can run this locally.

@raminmohammadi, I tried but was unable to create a local setup to test your notebook because of some permission issues.

@zoyahav, Can you please give some feedback why the transform output in TFX pipeline is different from expected output when running the transformation outside TFX pipeline. Thanks.

Any updates on this issue? Tnx