keras-team/tf-keras

TextVectorization: output_mode={multi_hot, count} promise int arrays but output floats

Opened this issue · 0 comments

Issue filed in Keras by @nicdumz - keras-team/keras#18973

Documentation for output_mode currently reads:

"multi_hot": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item.
"count": Like "multi_hot", but the int array contains a count of the number of times the token at that index appeared in the batch item.

repro

import tensorflow as tf, tensorflow.version as tv

print(f"{tv.VERSION}, {tv.COMPILER_VERSION}, {tv.GIT_VERSION}")

v = tf.keras.layers.TextVectorization(output_mode="count")
v.adapt(["foo", "bar", "baz"])
print(v(["bar baz"]).dtype)

ouput

2.15.0, Ubuntu Clang 17.0.2 (++20231003073124+b2417f51dbbd-1~exp1~20231003073217.50), v2.15.0-2-g0b15fdfcb3f
<dtype: 'float32'>