Error when starting Evaluator component

Question

Error when starting Evaluator component

jinmc opened this issue 6 months ago · 6 comments

If the bug is related to a specific library below, please raise an issue in the
respective repo directly: Evaluator component

TensorFlow Data Validation Repo

TensorFlow Model Analysis Repo

TensorFlow Transform Repo

TensorFlow Serving Repo

System information

Have I specified the code to reproduce the issue (Yes, No): Yes
Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
Interactive Notebook, Google Cloud, etc): GCP Vertex Workbench
TensorFlow version: 2.13.0
TFX Version: 1.14.0
Python version: 3.10
Python dependencies (from pip freeze output):

Package                         Version
------------------------------- --------------
absl-py                         1.4.0
anyio                           4.3.0
apache-beam                     2.50.0
appnope                         0.1.3
argon2-cffi                     23.1.0
argon2-cffi-bindings            21.2.0
array-record                    0.5.0
arrow                           1.3.0
asttokens                       2.4.1
astunparse                      1.6.3
async-lru                       2.0.4
attrs                           21.4.0
Babel                           2.14.0
backcall                        0.2.0
beautifulsoup4                  4.12.3
bleach                          6.1.0
Brotli                          1.1.0
cached-property                 1.5.2
cachetools                      5.3.3
certifi                         2024.2.2
cffi                            1.16.0
charset-normalizer              3.3.2
click                           8.1.7
cloudpickle                     2.2.1
comm                            0.2.2
contourpy                       1.2.0
crcmod                          1.7
cycler                          0.12.1
debugpy                         1.8.1
decorator                       5.1.1
defusedxml                      0.7.1
dill                            0.3.1.1
dm-tree                         0.1.8
dnspython                       2.6.1
docker                          4.4.4
docopt                          0.6.2
docstring_parser                0.16
entrypoints                     0.4
et-xmlfile                      1.1.0
etils                           1.7.0
exceptiongroup                  1.2.0
executing                       2.0.1
fastavro                        1.9.4
fasteners                       0.19
fastjsonschema                  2.19.1
flatbuffers                     24.3.25
fonttools                       4.50.0
fqdn                            1.5.1
fsspec                          2024.2.0
gast                            0.4.0
google-api-core                 2.12.0
google-api-python-client        1.12.11
google-apitools                 0.5.31
google-auth                     2.29.0
google-auth-httplib2            0.1.1
google-auth-oauthlib            1.0.0
google-cloud-aiplatform         1.45.0
google-cloud-bigquery           2.34.4
google-cloud-bigquery-storage   2.22.0
google-cloud-bigtable           2.21.0
google-cloud-core               2.4.1
google-cloud-datastore          2.18.0
google-cloud-dlp                3.12.3
google-cloud-language           2.11.1
google-cloud-pubsub             2.18.4
google-cloud-pubsublite         1.8.3
google-cloud-recommendations-ai 0.10.5
google-cloud-resource-manager   1.12.3
google-cloud-spanner            3.40.1
google-cloud-storage            2.14.0
google-cloud-videointelligence  2.11.4
google-cloud-vision             3.4.5
google-crc32c                   1.5.0
google-pasta                    0.2.0
google-resumable-media          2.7.0
googleapis-common-protos        1.63.0
grpc-google-iam-v1              0.13.0
grpcio                          1.59.2
grpcio-status                   1.48.2
h11                             0.14.0
h2                              4.1.0
h5py                            3.10.0
hdfs                            2.7.3
hpack                           4.0.0
httpcore                        1.0.5
httplib2                        0.22.0
httpx                           0.27.0
hyperframe                      6.0.1
idna                            3.6
imageio                         2.34.0
importlib_metadata              7.1.0
importlib_resources             6.4.0
ipykernel                       6.29.3
ipython                         7.34.0
ipython-genutils                0.2.0
ipywidgets                      7.8.1
isoduration                     20.11.0
jedi                            0.19.1
Jinja2                          3.1.3
joblib                          1.3.2
Js2Py                           0.74
json5                           0.9.24
jsonpointer                     2.4
jsonschema                      4.17.3
jsonschema-specifications       2023.12.1
jupyter                         1.0.0
jupyter_client                  7.4.9
jupyter-console                 6.6.3
jupyter_core                    5.7.2
jupyter-events                  0.10.0
jupyter-lsp                     2.2.4
jupyter_server                  2.13.0
jupyter_server_terminals        0.4.4
jupyterlab                      4.1.5
jupyterlab-pygments             0.2.2
jupyterlab_server               2.25.4
jupyterlab-widgets              1.1.7
keras                           2.13.1
keras-tuner                     1.4.7
kiwisolver                      1.4.5
kt-legacy                       1.0.5
kubernetes                      12.0.1
lazy_loader                     0.3
libclang                        18.1.1
Markdown                        3.6
MarkupSafe                      2.1.5
matplotlib                      3.8.1
matplotlib-inline               0.1.6
mistune                         3.0.2
ml-dtypes                       0.2.0
ml-metadata                     1.14.0
ml-pipelines-sdk                1.14.0
nbclassic                       1.0.0
nbclient                        0.10.0
nbconvert                       7.16.3
nbformat                        5.10.3
nest_asyncio                    1.6.0
networkx                        3.2.1
notebook                        6.5.6
notebook_shim                   0.2.4
numpy                           1.24.3
nvidia-cublas-cu12              12.4.2.65
nvidia-cuda-cupti-cu12          12.2.142
nvidia-cuda-nvcc-cu12           12.2.140
nvidia-cuda-runtime-cu12        12.4.99
nvidia-cudnn-cu12               9.0.0.312
nvidia-cufft-cu12               11.0.8.103
nvidia-curand-cu12              10.3.3.141
nvidia-cusolver-cu12            11.5.2.141
nvidia-cusparse-cu12            12.3.0.142
nvidia-nccl-cu12                2.16.5
nvidia-nvjitlink-cu12           12.4.99
nvidia-tensorrt                 99.0.0
oauth2client                    4.1.3
oauthlib                        3.2.2
objsize                         0.6.1
opencv-python                   4.9.0.80
openpyxl                        3.1.2
opt-einsum                      3.3.0
orjson                          3.10.0
overrides                       6.5.0
packaging                       20.9
pandas                          1.5.3
pandocfilters                   1.5.0
parso                           0.8.3
pexpect                         4.9.0
pickleshare                     0.7.5
pillow                          10.2.0
pip                             24.0
pkgutil_resolve_name            1.3.10
platformdirs                    4.2.0
portpicker                      1.6.0
prometheus_client               0.20.0
promise                         2.3
prompt-toolkit                  3.0.42
proto-plus                      1.23.0
protobuf                        3.20.3
psutil                          5.9.8
ptyprocess                      0.7.0
pure-eval                       0.2.2
pyarrow                         10.0.1
pyasn1                          0.6.0
pyasn1_modules                  0.4.0
pycparser                       2.21
pydantic                        1.10.14
pydot                           1.4.2
pyfarmhash                      0.3.2
Pygments                        2.17.2
pyjsparser                      2.7.1
pymongo                         4.6.3
pyparsing                       3.1.2
pyrsistent                      0.20.0
PySocks                         1.7.1
python-dateutil                 2.9.0
python-json-logger              2.0.7
pytz                            2024.1
PyYAML                          6.0.1
pyzmq                           24.0.1
qtconsole                       5.5.1
QtPy                            2.4.1
referencing                     0.34.0
regex                           2023.12.25
requests                        2.31.0
requests-oauthlib               2.0.0
rfc3339-validator               0.1.4
rfc3986-validator               0.1.1
rpds-py                         0.18.0
rsa                             4.9
scikit-learn                    1.3.2
scipy                           1.12.0
Send2Trash                      1.8.2
setuptools                      69.2.0
shapely                         2.0.3
six                             1.16.0
sniffio                         1.3.1
soupsieve                       2.5
sqlparse                        0.4.4
stack-data                      0.6.2
tensorboard                     2.13.0
tensorboard-data-server         0.7.2
tensorflow                      2.13.1
tensorflow-data-validation      1.14.0
tensorflow-datasets             4.9.3
tensorflow-estimator            2.13.0
tensorflow-hub                  0.13.0
tensorflow-io-gcs-filesystem    0.36.0
tensorflow-metadata             1.14.0
tensorflow-model-analysis       0.45.0
tensorflow-model-optimization   0.8.0
tensorflow-recommenders         0.7.3
tensorflow-serving-api          2.13.1
tensorflow-transform            1.14.0
tensorrt                        8.6.1.post1
tensorrt-bindings               8.6.1
tensorrt-libs                   8.6.1
termcolor                       2.4.0
terminado                       0.18.1
tfx                             1.14.0
tfx-bsl                         1.14.0
threadpoolctl                   3.4.0
tifffile                        2024.2.12
tinycss2                        1.2.1
toml                            0.10.2
tomli                           2.0.1
tornado                         6.4
tqdm                            4.66.2
traitlets                       5.14.2
types-python-dateutil           2.9.0.20240316
typing_extensions               4.10.0
typing-utils                    0.1.0
tzlocal                         5.2
uri-template                    1.3.0
uritemplate                     3.0.1
urllib3                         2.2.1
wcwidth                         0.2.13
webcolors                       1.13
webencodings                    0.5.1
websocket-client                1.7.0
Werkzeug                        3.0.1
wheel                           0.43.0
widgetsnbextension              3.6.6
wrapt                           1.16.0
zipp                            3.17.0
zstandard                       0.22.0

Describe the current behavior

This is the code I am using to replicate the code.

import tensorflow as tf
from tensorflow import keras
from callbacks import *


def setup_pretrained_model(args):
    """
    Function to load pretrained model
    """
    IMG_SHAPE = (args["input_dim"], args["input_dim"], 3)
    # Transfer learning model with MobileNetV3
    base_model = tf.keras.applications.MobileNetV3Large(
        input_shape=IMG_SHAPE,
        include_top=False,
        weights='imagenet',
        minimalistic=True
    )
    # Freeze the pre-trained model weights
    base_model.trainable = False
    x = tf.keras.layers.GlobalMaxPooling2D()(base_model.output)
    x = tf.keras.layers.Dropout(0.2, name="top_dropout")(x)
    if return_logits:
        # Return logits if return_logits is True
        x = tf.keras.layers.Dense(args["num_classes"])(x)
    else:
        # Return softmax probabilities (original behavior)
        x = tf.keras.layers.Dense(args["num_classes"], activation='sigmoid')(x)
    
    model = tf.keras.Model(base_model.input, x)
    return model

def _input_fn(file_pattern, batch_size):
    # Define how to parse the example
    feature_description = {
        'image/encoded': tf.io.FixedLenFeature([], tf.string),
        'image/label': tf.io.FixedLenFeature([], tf.int64),
    }
    
    def _parse_function(example_proto):
        features = tf.io.parse_single_example(example_proto, feature_description)
        # Decode the JPEG image
        image = tf.image.decode_jpeg(features['image/encoded'], channels=3)
        # Resize the image to a fixed size (adjust as needed)
        image = tf.image.resize(image, [224, 224])
        label = features['image/label']
        return image, label
    
    # Load and parse the data
    # path/file.gz path/*
    
    all_files = tf.io.gfile.glob(file_pattern)
    filtered_files = [file for file in all_files if file.endswith('.gz')]
    
    print("file pattern in input_fn : ", file_pattern)
    print("filtered files : ", filtered_files)
    dataset = tf.data.TFRecordDataset(filtered_files, compression_type='GZIP')
    dataset = dataset.map(_parse_function)
    dataset = dataset.batch(batch_size)
    
    return dataset
        
def fit_model(fn_args, args):
    try:
        model = setup_pretrained_model(args)
        # model.summary()
        
        model.compile(
            optimizer=tf.keras.optimizers.Adam(lr=lr_schedule), 
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        
        hist = model.fit(
            fn_args.train_data,
            validation_data=fn_args.eval_data,
            epochs=fn_args.train_steps,
            callbacks=[
                early_stopping(),
            ],
        )
        return model

    except ValueError as e:
        print(e)
        print("Training Stopped. Check the log.")

def run_fn(fn_args):    
    train_dataset = _input_fn(fn_args.train_files[0], 16)
    eval_dataset = _input_fn(fn_args.eval_files[0], 16)
    fn_args.train_data = train_dataset
    fn_args.eval_data = eval_dataset
    hist = fit_model(fn_args, args)

from tfx.components import Trainer
from tfx.proto import trainer_pb2
from tfx.dsl.components.base import executor_spec
from tfx.components.trainer.executor import GenericExecutor
from tfx.types import channel_utils

qat_trainer = Trainer(
    module_file='model.py',
    examples=example_gen.outputs['examples'],
    train_args=trainer_pb2.TrainArgs(num_steps=100),
    eval_args=trainer_pb2.EvalArgs(num_steps=100),
)

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(signature_name="serving_default", label_key="image/label"),
    ],
    metrics_specs=[
        tfma.MetricsSpec(metrics=[
            tfma.MetricConfig(class_name="SparseCategoricalAccuracy",
                              threshold=tfma.MetricThreshold(
                                  value_threshold=tfma.GenericValueThreshold(lower_bound={"value": 0.8}),
                              ))
        ])
    ],
    slicing_specs=[
        # Evaluate metrics for all data
        tfma.SlicingSpec()
        # Example of slicing spec if you want to analyze performance for specific slices
        # tfma.SlicingSpec(feature_keys=["some_feature_key"])
    ]
)

# Now, to integrate this into the TFX pipeline, use the Evaluator component:
evaluator = components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    eval_config=eval_config,
)

context.run(evaluator)


WARNING:absl:Large batch_size 1 failed with error Fail to call signature func with signature_name: serving_default.
              the inputs are:
['bytes_as_images'].
              The input_specs are:
 {'input_7': TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_7')}.. Attempting to run batch through serially. Note that this will significantly affect the performance.

also, this error message

TypeError: Binding inputs to tf.function signature_wrapperfailed due totoo many positional arguments. Received args: (<tf.Tensor: shape=(1,), dtype=string, numpy=

Describe the expected behavior

Evaluator should be instantiated.

Standalone code to reproduce the issue

I have used test code to evaulate the model and it gives 99% of accuracy, but can't seem to use Evaluator.

import tensorflow as tf
import os

model_uri = qat_trainer.outputs['model'].get()[0].uri
print(os.listdir(model_uri))
print(os.listdir(model_uri + "/Format-Serving/"))

loaded_model2 = tf.saved_model.load(model_uri + "/Format-Serving/")
# print(os.listdir(loaded_model2))
print(list(loaded_model2.signatures.keys())) 

print(example_gen.outputs['examples'])

def _parse_function(example_proto):
    # Define your parsing schema
    image_feature_description = {
        'image/encoded': tf.io.FixedLenFeature([], tf.string),
        'image/label': tf.io.FixedLenFeature([], tf.int64),
    }
    # Parse the input tf.Example proto using the schema
    features = tf.io.parse_single_example(example_proto, image_feature_description)
    
    # Decode the JPEG image
    image = tf.image.decode_jpeg(features['image/encoded'], channels=3)
    
    # Apply any additional preprocessing: resizing, normalization, etc.
    image = tf.image.resize(image, [224, 224])
    # image = image / 255.0  # Normalize to [0, 1] if required by your model
    
    label = features['image/label']  # Assuming you have labels and need them
    return image, label

import numpy as np

def get_dataset_from_tfrecords(tfrecord_files, batch_size=16):
    print("tfrecord files in get dataset from tfrecords")
    # print(os.listdir(tfrecord_files))
    # Create a tf.data.Dataset from TFRecord files
    raw_dataset = tf.data.TFRecordDataset(tfrecord_files, compression_type="GZIP")
    # Apply your parsing and preprocessing function
    parsed_dataset = raw_dataset.map(_parse_function)
    # Batch the dataset
    batched_dataset = parsed_dataset.batch(batch_size)
    
    return batched_dataset

# Usage example, assuming you're in a context where you can access example_gen.outputs['examples']
tfrecord_files = [artifact.uri for artifact in example_gen.outputs['examples'].get()][0] + "/Split-eval/*"
print(tfrecord_files )
all_files = tf.io.gfile.glob(tfrecord_files)
print("all_files : ", all_files)
filtered_files = [file for file in all_files if file.endswith('.gz')]
print("filtered_files : ", filtered_files)
image_batch_dataset = get_dataset_from_tfrecords(filtered_files)
print("image_batch_dataset: ", image_batch_dataset)

num_elements = 0
for _ in image_batch_dataset:
    num_elements += 1

# Initialize counters
correct_predictions = 0
total_predictions = 0

# print("Number of batches in the dataset:", num_elements)
import matplotlib.pyplot as plt

infer = loaded_model.signatures["serving_default"]
input_name = list(infer.structured_input_signature[1].keys())[0]
output_name = list(infer.structured_outputs.keys())[0]
print("input_name", input_name)
print("output_name", output_name)

for image_batch, label_batch in image_batch_dataset:
    # print(loaded_model.signatures["serving_default"])
    input_data = {input_name: image_batch}
    infer = loaded_model.signatures["serving_default"]

    predictions = infer(**input_data)
    # predictions = loaded_model.signatures["serving_default"](input_2=image_batch)
    # print(predictions)
    # predicted_classes = np.argmax(predictions['quant_dense_1'].numpy(), axis=1)
    predicted_classes = np.argmax(predictions[output_name].numpy(), axis=1)
    
    print("predicted_classes", predicted_classes)
    print("label_batch.numpy()", label_batch.numpy())
    # print("label_batch.shape[0]", label_batch.shape[0])
    # Update counters
    correct_predictions += np.sum(predicted_classes == label_batch.numpy())
    total_predictions += label_batch.shape[0]

print("correct_predictions", correct_predictions)
print("total_predictions", total_predictions)

accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy: .2f}")

Name of your Organization (Optional)

Other info / logs

Any help will be appreiciated

Answer 1 · 2024-04-09T03:56:03.000Z

I've been progressing with using the custom module file to for evaulator to solve this issue.

import tensorflow_model_analysis as tfma
from typing import List

def custom_eval_shared_model(eval_saved_model_path, model_name, eval_config, **kwargs) -> tfma.EvalSharedModel:
    """
    Creates a custom EvalSharedModel. This can be used to configure how the model
    is loaded and used for evaluation.

    Args:
        eval_saved_model_path (str): The file path to the saved TensorFlow model.
        model_name (str): The name of the model.
        eval_config (tfma.EvalConfig): Evaluation configuration.
        **kwargs: Additional keyword arguments.

    Returns:
        tfma.EvalSharedModel: A custom EvalSharedModel instance.
    """
    # Example of creating an EvalSharedModel with custom settings.
    # Adjust as needed based on your model and evaluation requirements.
    print("eval saved model path: ", eval_saved_model_path)
    print("model name: ", model_name)
    print("eval config: ", eval_config)
    print("kwargs: ", kwargs)
    return tfma.default_eval_shared_model(
        eval_saved_model_path=eval_saved_model_path,
        model_name=model_name,
        eval_config=eval_config,
        **kwargs
    )

## tfma.extractor documentation page : https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/extractors

def custom_extractors(eval_shared_model, eval_config, tensor_adapter_config) -> List[tfma.extractors.Extractor]:
    """
    Defines custom extractors to be used during evaluation. Extractors are used to
    extract necessary information from the dataset and model during the evaluation process.

    Args:
        eval_shared_model (tfma.EvalSharedModel): The evaluation shared model.
        eval_config (tfma.EvalConfig): Evaluation configuration.
        tensor_adapter_config (tfma.TensorAdapterConfig): Configuration for tensor adaptation.

    Returns:
        List[tfma.extractors.Extractor]: A list of custom extractors.
    """
    # Example: Return the default set of extractors. Modify this list to add custom extractors or
    # replace it with your own extractors as needed.
    print("eval_shared_model: ", eval_shared_model)
    print("eval_config: ", eval_config)
    print("tensor_adapter_config: ", tensor_adapter_config)

    predict_extractor = tfma.extractors.PredictExtractor(
        eval_shared_model=eval_shared_model,
        eval_config=eval_config,
        desired_batch_size=16,
    )
    slice_key_extractor = tfma.extractors.SliceKeyExtractor()

    
    return [slice_key_extractor, predict_extractor]

Answer 2 · 2024-04-09T09:05:01.000Z

having issues again with above method.

ValueError: "labels" key not found in extracts. Check that the configuration is setup properly to specify the name of label input and that the proper extractor has been configured to extract the labels from the inputs. Existing keys: dict_keys([]) [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/Preprocesss']

Answer 3 · 2024-04-19T07:12:25.000Z

@jinmc,

Looks like this is not an issue from TFX side. This question is better asked on TensorFlow Forum since it is not a bug or feature request. There is also a larger community that reads questions there. Thank you!

Answer 4 · 2024-04-27T01:46:33.000Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Answer 5 · 2024-05-04T01:46:43.000Z

This issue was closed due to lack of activity after being marked stale for past 7 days.

Answer 6 · 2024-05-04T01:46:48.000Z

Are you satisfied with the resolution of your issue?
Yes
No