Error when starting Evaluator component
jinmc opened this issue · 6 comments
If the bug is related to a specific library below, please raise an issue in the
respective repo directly: Evaluator component
TensorFlow Data Validation Repo
TensorFlow Model Analysis Repo
System information
- Have I specified the code to reproduce the issue (Yes, No): Yes
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
Interactive Notebook, Google Cloud, etc): GCP Vertex Workbench - TensorFlow version: 2.13.0
- TFX Version: 1.14.0
- Python version: 3.10
- Python dependencies (from
pip freeze
output):
Package Version
------------------------------- --------------
absl-py 1.4.0
anyio 4.3.0
apache-beam 2.50.0
appnope 0.1.3
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
array-record 0.5.0
arrow 1.3.0
asttokens 2.4.1
astunparse 1.6.3
async-lru 2.0.4
attrs 21.4.0
Babel 2.14.0
backcall 0.2.0
beautifulsoup4 4.12.3
bleach 6.1.0
Brotli 1.1.0
cached-property 1.5.2
cachetools 5.3.3
certifi 2024.2.2
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 2.2.1
comm 0.2.2
contourpy 1.2.0
crcmod 1.7
cycler 0.12.1
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.1.1
dm-tree 0.1.8
dnspython 2.6.1
docker 4.4.4
docopt 0.6.2
docstring_parser 0.16
entrypoints 0.4
et-xmlfile 1.1.0
etils 1.7.0
exceptiongroup 1.2.0
executing 2.0.1
fastavro 1.9.4
fasteners 0.19
fastjsonschema 2.19.1
flatbuffers 24.3.25
fonttools 4.50.0
fqdn 1.5.1
fsspec 2024.2.0
gast 0.4.0
google-api-core 2.12.0
google-api-python-client 1.12.11
google-apitools 0.5.31
google-auth 2.29.0
google-auth-httplib2 0.1.1
google-auth-oauthlib 1.0.0
google-cloud-aiplatform 1.45.0
google-cloud-bigquery 2.34.4
google-cloud-bigquery-storage 2.22.0
google-cloud-bigtable 2.21.0
google-cloud-core 2.4.1
google-cloud-datastore 2.18.0
google-cloud-dlp 3.12.3
google-cloud-language 2.11.1
google-cloud-pubsub 2.18.4
google-cloud-pubsublite 1.8.3
google-cloud-recommendations-ai 0.10.5
google-cloud-resource-manager 1.12.3
google-cloud-spanner 3.40.1
google-cloud-storage 2.14.0
google-cloud-videointelligence 2.11.4
google-cloud-vision 3.4.5
google-crc32c 1.5.0
google-pasta 0.2.0
google-resumable-media 2.7.0
googleapis-common-protos 1.63.0
grpc-google-iam-v1 0.13.0
grpcio 1.59.2
grpcio-status 1.48.2
h11 0.14.0
h2 4.1.0
h5py 3.10.0
hdfs 2.7.3
hpack 4.0.0
httpcore 1.0.5
httplib2 0.22.0
httpx 0.27.0
hyperframe 6.0.1
idna 3.6
imageio 2.34.0
importlib_metadata 7.1.0
importlib_resources 6.4.0
ipykernel 6.29.3
ipython 7.34.0
ipython-genutils 0.2.0
ipywidgets 7.8.1
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.3
joblib 1.3.2
Js2Py 0.74
json5 0.9.24
jsonpointer 2.4
jsonschema 4.17.3
jsonschema-specifications 2023.12.1
jupyter 1.0.0
jupyter_client 7.4.9
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.4
jupyter_server 2.13.0
jupyter_server_terminals 0.4.4
jupyterlab 4.1.5
jupyterlab-pygments 0.2.2
jupyterlab_server 2.25.4
jupyterlab-widgets 1.1.7
keras 2.13.1
keras-tuner 1.4.7
kiwisolver 1.4.5
kt-legacy 1.0.5
kubernetes 12.0.1
lazy_loader 0.3
libclang 18.1.1
Markdown 3.6
MarkupSafe 2.1.5
matplotlib 3.8.1
matplotlib-inline 0.1.6
mistune 3.0.2
ml-dtypes 0.2.0
ml-metadata 1.14.0
ml-pipelines-sdk 1.14.0
nbclassic 1.0.0
nbclient 0.10.0
nbconvert 7.16.3
nbformat 5.10.3
nest_asyncio 1.6.0
networkx 3.2.1
notebook 6.5.6
notebook_shim 0.2.4
numpy 1.24.3
nvidia-cublas-cu12 12.4.2.65
nvidia-cuda-cupti-cu12 12.2.142
nvidia-cuda-nvcc-cu12 12.2.140
nvidia-cuda-runtime-cu12 12.4.99
nvidia-cudnn-cu12 9.0.0.312
nvidia-cufft-cu12 11.0.8.103
nvidia-curand-cu12 10.3.3.141
nvidia-cusolver-cu12 11.5.2.141
nvidia-cusparse-cu12 12.3.0.142
nvidia-nccl-cu12 2.16.5
nvidia-nvjitlink-cu12 12.4.99
nvidia-tensorrt 99.0.0
oauth2client 4.1.3
oauthlib 3.2.2
objsize 0.6.1
opencv-python 4.9.0.80
openpyxl 3.1.2
opt-einsum 3.3.0
orjson 3.10.0
overrides 6.5.0
packaging 20.9
pandas 1.5.3
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.2.0
pip 24.0
pkgutil_resolve_name 1.3.10
platformdirs 4.2.0
portpicker 1.6.0
prometheus_client 0.20.0
promise 2.3
prompt-toolkit 3.0.42
proto-plus 1.23.0
protobuf 3.20.3
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 10.0.1
pyasn1 0.6.0
pyasn1_modules 0.4.0
pycparser 2.21
pydantic 1.10.14
pydot 1.4.2
pyfarmhash 0.3.2
Pygments 2.17.2
pyjsparser 2.7.1
pymongo 4.6.3
pyparsing 3.1.2
pyrsistent 0.20.0
PySocks 1.7.1
python-dateutil 2.9.0
python-json-logger 2.0.7
pytz 2024.1
PyYAML 6.0.1
pyzmq 24.0.1
qtconsole 5.5.1
QtPy 2.4.1
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
requests-oauthlib 2.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.18.0
rsa 4.9
scikit-learn 1.3.2
scipy 1.12.0
Send2Trash 1.8.2
setuptools 69.2.0
shapely 2.0.3
six 1.16.0
sniffio 1.3.1
soupsieve 2.5
sqlparse 0.4.4
stack-data 0.6.2
tensorboard 2.13.0
tensorboard-data-server 0.7.2
tensorflow 2.13.1
tensorflow-data-validation 1.14.0
tensorflow-datasets 4.9.3
tensorflow-estimator 2.13.0
tensorflow-hub 0.13.0
tensorflow-io-gcs-filesystem 0.36.0
tensorflow-metadata 1.14.0
tensorflow-model-analysis 0.45.0
tensorflow-model-optimization 0.8.0
tensorflow-recommenders 0.7.3
tensorflow-serving-api 2.13.1
tensorflow-transform 1.14.0
tensorrt 8.6.1.post1
tensorrt-bindings 8.6.1
tensorrt-libs 8.6.1
termcolor 2.4.0
terminado 0.18.1
tfx 1.14.0
tfx-bsl 1.14.0
threadpoolctl 3.4.0
tifffile 2024.2.12
tinycss2 1.2.1
toml 0.10.2
tomli 2.0.1
tornado 6.4
tqdm 4.66.2
traitlets 5.14.2
types-python-dateutil 2.9.0.20240316
typing_extensions 4.10.0
typing-utils 0.1.0
tzlocal 5.2
uri-template 1.3.0
uritemplate 3.0.1
urllib3 2.2.1
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.7.0
Werkzeug 3.0.1
wheel 0.43.0
widgetsnbextension 3.6.6
wrapt 1.16.0
zipp 3.17.0
zstandard 0.22.0
Describe the current behavior
This is the code I am using to replicate the code.
import tensorflow as tf
from tensorflow import keras
from callbacks import *
def setup_pretrained_model(args):
"""
Function to load pretrained model
"""
IMG_SHAPE = (args["input_dim"], args["input_dim"], 3)
# Transfer learning model with MobileNetV3
base_model = tf.keras.applications.MobileNetV3Large(
input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet',
minimalistic=True
)
# Freeze the pre-trained model weights
base_model.trainable = False
x = tf.keras.layers.GlobalMaxPooling2D()(base_model.output)
x = tf.keras.layers.Dropout(0.2, name="top_dropout")(x)
if return_logits:
# Return logits if return_logits is True
x = tf.keras.layers.Dense(args["num_classes"])(x)
else:
# Return softmax probabilities (original behavior)
x = tf.keras.layers.Dense(args["num_classes"], activation='sigmoid')(x)
model = tf.keras.Model(base_model.input, x)
return model
def _input_fn(file_pattern, batch_size):
# Define how to parse the example
feature_description = {
'image/encoded': tf.io.FixedLenFeature([], tf.string),
'image/label': tf.io.FixedLenFeature([], tf.int64),
}
def _parse_function(example_proto):
features = tf.io.parse_single_example(example_proto, feature_description)
# Decode the JPEG image
image = tf.image.decode_jpeg(features['image/encoded'], channels=3)
# Resize the image to a fixed size (adjust as needed)
image = tf.image.resize(image, [224, 224])
label = features['image/label']
return image, label
# Load and parse the data
# path/file.gz path/*
all_files = tf.io.gfile.glob(file_pattern)
filtered_files = [file for file in all_files if file.endswith('.gz')]
print("file pattern in input_fn : ", file_pattern)
print("filtered files : ", filtered_files)
dataset = tf.data.TFRecordDataset(filtered_files, compression_type='GZIP')
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size)
return dataset
def fit_model(fn_args, args):
try:
model = setup_pretrained_model(args)
# model.summary()
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=lr_schedule),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
hist = model.fit(
fn_args.train_data,
validation_data=fn_args.eval_data,
epochs=fn_args.train_steps,
callbacks=[
early_stopping(),
],
)
return model
except ValueError as e:
print(e)
print("Training Stopped. Check the log.")
def run_fn(fn_args):
train_dataset = _input_fn(fn_args.train_files[0], 16)
eval_dataset = _input_fn(fn_args.eval_files[0], 16)
fn_args.train_data = train_dataset
fn_args.eval_data = eval_dataset
hist = fit_model(fn_args, args)
from tfx.components import Trainer
from tfx.proto import trainer_pb2
from tfx.dsl.components.base import executor_spec
from tfx.components.trainer.executor import GenericExecutor
from tfx.types import channel_utils
qat_trainer = Trainer(
module_file='model.py',
examples=example_gen.outputs['examples'],
train_args=trainer_pb2.TrainArgs(num_steps=100),
eval_args=trainer_pb2.EvalArgs(num_steps=100),
)
eval_config = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(signature_name="serving_default", label_key="image/label"),
],
metrics_specs=[
tfma.MetricsSpec(metrics=[
tfma.MetricConfig(class_name="SparseCategoricalAccuracy",
threshold=tfma.MetricThreshold(
value_threshold=tfma.GenericValueThreshold(lower_bound={"value": 0.8}),
))
])
],
slicing_specs=[
# Evaluate metrics for all data
tfma.SlicingSpec()
# Example of slicing spec if you want to analyze performance for specific slices
# tfma.SlicingSpec(feature_keys=["some_feature_key"])
]
)
# Now, to integrate this into the TFX pipeline, use the Evaluator component:
evaluator = components.Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'],
eval_config=eval_config,
)
context.run(evaluator)
WARNING:absl:Large batch_size 1 failed with error Fail to call signature func with signature_name: serving_default.
the inputs are:
['bytes_as_images'].
The input_specs are:
{'input_7': TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_7')}.. Attempting to run batch through serially. Note that this will significantly affect the performance.
also, this error message
TypeError: Binding inputs to tf.function
signature_wrapperfailed due to
too many positional arguments. Received args: (<tf.Tensor: shape=(1,), dtype=string, numpy=
Describe the expected behavior
Evaluator should be instantiated.
Standalone code to reproduce the issue
I have used test code to evaulate the model and it gives 99% of accuracy, but can't seem to use Evaluator.
import tensorflow as tf
import os
model_uri = qat_trainer.outputs['model'].get()[0].uri
print(os.listdir(model_uri))
print(os.listdir(model_uri + "/Format-Serving/"))
loaded_model2 = tf.saved_model.load(model_uri + "/Format-Serving/")
# print(os.listdir(loaded_model2))
print(list(loaded_model2.signatures.keys()))
print(example_gen.outputs['examples'])
def _parse_function(example_proto):
# Define your parsing schema
image_feature_description = {
'image/encoded': tf.io.FixedLenFeature([], tf.string),
'image/label': tf.io.FixedLenFeature([], tf.int64),
}
# Parse the input tf.Example proto using the schema
features = tf.io.parse_single_example(example_proto, image_feature_description)
# Decode the JPEG image
image = tf.image.decode_jpeg(features['image/encoded'], channels=3)
# Apply any additional preprocessing: resizing, normalization, etc.
image = tf.image.resize(image, [224, 224])
# image = image / 255.0 # Normalize to [0, 1] if required by your model
label = features['image/label'] # Assuming you have labels and need them
return image, label
import numpy as np
def get_dataset_from_tfrecords(tfrecord_files, batch_size=16):
print("tfrecord files in get dataset from tfrecords")
# print(os.listdir(tfrecord_files))
# Create a tf.data.Dataset from TFRecord files
raw_dataset = tf.data.TFRecordDataset(tfrecord_files, compression_type="GZIP")
# Apply your parsing and preprocessing function
parsed_dataset = raw_dataset.map(_parse_function)
# Batch the dataset
batched_dataset = parsed_dataset.batch(batch_size)
return batched_dataset
# Usage example, assuming you're in a context where you can access example_gen.outputs['examples']
tfrecord_files = [artifact.uri for artifact in example_gen.outputs['examples'].get()][0] + "/Split-eval/*"
print(tfrecord_files )
all_files = tf.io.gfile.glob(tfrecord_files)
print("all_files : ", all_files)
filtered_files = [file for file in all_files if file.endswith('.gz')]
print("filtered_files : ", filtered_files)
image_batch_dataset = get_dataset_from_tfrecords(filtered_files)
print("image_batch_dataset: ", image_batch_dataset)
num_elements = 0
for _ in image_batch_dataset:
num_elements += 1
# Initialize counters
correct_predictions = 0
total_predictions = 0
# print("Number of batches in the dataset:", num_elements)
import matplotlib.pyplot as plt
infer = loaded_model.signatures["serving_default"]
input_name = list(infer.structured_input_signature[1].keys())[0]
output_name = list(infer.structured_outputs.keys())[0]
print("input_name", input_name)
print("output_name", output_name)
for image_batch, label_batch in image_batch_dataset:
# print(loaded_model.signatures["serving_default"])
input_data = {input_name: image_batch}
infer = loaded_model.signatures["serving_default"]
predictions = infer(**input_data)
# predictions = loaded_model.signatures["serving_default"](input_2=image_batch)
# print(predictions)
# predicted_classes = np.argmax(predictions['quant_dense_1'].numpy(), axis=1)
predicted_classes = np.argmax(predictions[output_name].numpy(), axis=1)
print("predicted_classes", predicted_classes)
print("label_batch.numpy()", label_batch.numpy())
# print("label_batch.shape[0]", label_batch.shape[0])
# Update counters
correct_predictions += np.sum(predicted_classes == label_batch.numpy())
total_predictions += label_batch.shape[0]
print("correct_predictions", correct_predictions)
print("total_predictions", total_predictions)
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy: .2f}")
Name of your Organization (Optional)
Other info / logs
Any help will be appreiciated
I've been progressing with using the custom module file to for evaulator to solve this issue.
import tensorflow_model_analysis as tfma
from typing import List
def custom_eval_shared_model(eval_saved_model_path, model_name, eval_config, **kwargs) -> tfma.EvalSharedModel:
"""
Creates a custom EvalSharedModel. This can be used to configure how the model
is loaded and used for evaluation.
Args:
eval_saved_model_path (str): The file path to the saved TensorFlow model.
model_name (str): The name of the model.
eval_config (tfma.EvalConfig): Evaluation configuration.
**kwargs: Additional keyword arguments.
Returns:
tfma.EvalSharedModel: A custom EvalSharedModel instance.
"""
# Example of creating an EvalSharedModel with custom settings.
# Adjust as needed based on your model and evaluation requirements.
print("eval saved model path: ", eval_saved_model_path)
print("model name: ", model_name)
print("eval config: ", eval_config)
print("kwargs: ", kwargs)
return tfma.default_eval_shared_model(
eval_saved_model_path=eval_saved_model_path,
model_name=model_name,
eval_config=eval_config,
**kwargs
)
## tfma.extractor documentation page : https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/extractors
def custom_extractors(eval_shared_model, eval_config, tensor_adapter_config) -> List[tfma.extractors.Extractor]:
"""
Defines custom extractors to be used during evaluation. Extractors are used to
extract necessary information from the dataset and model during the evaluation process.
Args:
eval_shared_model (tfma.EvalSharedModel): The evaluation shared model.
eval_config (tfma.EvalConfig): Evaluation configuration.
tensor_adapter_config (tfma.TensorAdapterConfig): Configuration for tensor adaptation.
Returns:
List[tfma.extractors.Extractor]: A list of custom extractors.
"""
# Example: Return the default set of extractors. Modify this list to add custom extractors or
# replace it with your own extractors as needed.
print("eval_shared_model: ", eval_shared_model)
print("eval_config: ", eval_config)
print("tensor_adapter_config: ", tensor_adapter_config)
predict_extractor = tfma.extractors.PredictExtractor(
eval_shared_model=eval_shared_model,
eval_config=eval_config,
desired_batch_size=16,
)
slice_key_extractor = tfma.extractors.SliceKeyExtractor()
return [slice_key_extractor, predict_extractor]
having issues again with above method.
ValueError: "labels" key not found in extracts. Check that the configuration is setup properly to specify the name of label input and that the proper extractor has been configured to extract the labels from the inputs. Existing keys: dict_keys([]) [while running 'ExtractEvaluateAndWriteResults/ExtractAndEvaluate/EvaluateMetricsAndPlots/ComputeMetricsAndPlots()/Preprocesss']
Looks like this is not an issue from TFX side. This question is better asked on TensorFlow Forum since it is not a bug or feature request. There is also a larger community that reads questions there. Thank you!
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.