keras-team/tf-keras

Accurarcy() does not work, but 'accuracy' does

Opened this issue · 8 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): not really
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
  • TensorFlow installed from (source or binary): pip install tensorflow
  • TensorFlow version (use command below): 2.13.0
  • Python version: 3.11.5
  • Bazel version (if compiling from source): -
  • GPU model and memory: does not matter
  • Exact command to reproduce: run script below

Describe the problem.
If I write Accurarcy() in the metrics list, it does not work. But the String accuracy does work. According to the docs, bith sould work. See example code below.

Describe the current behavior.

469/469 [==============================] - 3s 6ms/step - loss: 0.2548 - accuracy: 0.0000e+00

Describe the expected behavior.

469/469 [==============================] - 3s 6ms/step - loss: 0.2540 - accuracy: 0.9260

Contributing.

  • Do you want to contribute a PR? (yes/no): no
  • If yes, please read this page for instructions
  • Briefly describe your candidate solution(if contributing): -

Standalone code to reproduce the issue.

import numpy as np
from keras import Sequential
from keras.datasets import mnist
from keras.src import activations
from keras.src.layers import Dense
from keras.src.losses import CategoricalCrossentropy
from keras.src.metrics import Accuracy
from keras.src.optimizers import RMSprop
from keras.src.utils import to_categorical


def preprocess_images(images):
    s = images.shape
    return images.reshape((s[0], s[1] * s[2])).astype(dtype=np.float32) / 255


if __name__ == '__main__':
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()
    processed_train_images = preprocess_images(train_images)
    processed_test_images = preprocess_images(test_images)
    processed_train_labels = to_categorical(y=train_labels)
    processed_test_labels = to_categorical(y=test_labels)

    network = Sequential()
    network.add(layer=Dense(units=512, activation=activations.relu, input_shape=(28 * 28, )))
    network.add(layer=Dense(units=10, activation=activations.softmax))

    network.compile(optimizer=RMSprop(), loss=CategoricalCrossentropy(), metrics=[Accuracy()])  # this does not work
    # network.compile(optimizer=RMSprop(), loss=CategoricalCrossentropy(), metrics=['accuracy'])  # this works

    network.fit(x=processed_train_images, y=processed_train_labels, epochs=1, batch_size=128)

Source code / logs.
Nothing.

@LostInDarkMath,
Apologies for the delay. Here in the code you are trying to use loss=CategoricalCrossentropy(), where Accuracy is not the better option. Instead you can use keras.metrics.CategoricalAccuracy for the accurate result. I tried to execute with keras.metrics.CategoricalAccuracy and it provided the required output. Kindly find the gist of it here.

Screenshot 2023-10-16 2 53 55 PM

https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CategoricalAccuracy
Thank you!

Thank your for the clarification.

That means that the string 'accuracy' is resolved differently depending on the loss function? Or does 'accuracy' always resolve to eras.metrics.CategoricalAccuracy? And where is this behavior documented? Is there something like a mapping table that visualizes this mapping? Would be very nice :)

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

@LostInDarkMath
Here the following API are technically same.

keras.metrics.Accuracy
keras.metrics.BinaryAccuracy

From doc,

keras.metrics.Accuracy(name="accuracy", ...)
keras.metrics.BinaryAccuracy(name="binary_accuracy", ...)
Calculates how often predictions equal labels.
This metric creates two local variables, total and count that are used to compute the frequency with which y_pred matches y_true. This frequency is ultimately returned as binary accuracy: an idempotent operation that simply divides total by count.

And when we use string identifier, for example [accuracy], later it converts to the appropriate metrics based on the labels and logits, source.

When you pass the
strings 'accuracy' or 'acc', we convert this to one of
tf.keras.metrics.BinaryAccuracy,
tf.keras.metrics.CategoricalAccuracy,
tf.keras.metrics.SparseCategoricalAccuracy based on the shapes
of the targets and of the model output. We do a similar

@tilakrayal
Apart from the abvove clarification, there may be potential bug. Please check this question on stack overflow. That is very similar to this issue.

From the model.copile document:

List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance. See tf.keras.metrics. Typically you will use metrics=['accuracy']. A function is any callable with the signature result = fn(y_true,y_pred). To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary, such as metrics={'output_a':'accuracy', 'output_b':['accuracy', 'mse']}. You can also pass a list to specify a metric or a list of metrics for each output, such as metrics=[['accuracy'], ['accuracy', 'mse']] or metrics=['accuracy', ['accuracy', 'mse']]. When you pass the strings 'accuracy' or 'acc', we convert this to one of tf.keras.metrics.BinaryAccuracy, tf.keras.metrics.CategoricalAccuracy, tf.keras.metrics.SparseCategoricalAccuracy based on the shapes of the targets and of the model output. We do a similar conversion for the strings 'crossentropy' and 'ce' as well. The metrics passed here are evaluated without sample weighting; if you would like sample weighting to apply, you can specify your metrics via the weighted_metrics argument instead.

When you use metrics.Accuracy, it resolves to BinaryAccuracy as stated in the doc of Accuracy. Whereas, when using string accuracy it resolves to BinaryAccuracy or CategoricalAccuracy, based on the target shape. This was also explained by @innat above. This may be confusing, so we recommend either explicitly using BinaryAccuracy or CategoricalAccuracy if you don't want to use the string accuracy for the metrics.

The same behavior exists in Keras 3 as well. Should we deprecate use of metrics.Accuracy or mimic its usage to string metric of accuracy?

The same behavior exists in Keras 3 as well. Should we deprecate use of metrics.Accuracy or mimic its usage to string metric of accuracy?

The metrics.Accuracy can be removed. It makes confusion, especially among the beginners a lot.