[BUG] Chapter 13: Solution to Exercise 10 might have an issue (paragraph e)

Question

[BUG] Chapter 13: Solution to Exercise 10 might have an issue (paragraph e)

vaharoni opened this issue a year ago · 0 comments

I am completely new to the material, so could be wrong, but I think the solution of exercise 10 has an issue. The solution presents the following function:

def compute_mean_embedding(inputs):
    not_pad = tf.math.count_nonzero(inputs, axis=-1)
    n_words = tf.math.count_nonzero(not_pad, axis=-1, keepdims=True)    
    sqrt_n_words = tf.math.sqrt(tf.cast(n_words, tf.float32))
    return tf.reduce_sum(inputs, axis=1) / sqrt_n_words

This function is supposed to calculate the square root of the number of words by counting non-zero vectors.
However, it is plugged into the model after the Embeddings layer, like so:

embedding_size = 20
tf.random.set_seed(42)

text_vectorization = tf.keras.layers.TextVectorization(
    max_tokens=max_tokens, output_mode="int")
text_vectorization.adapt(sample_reviews)

model = tf.keras.Sequential([
    text_vectorization,
    tf.keras.layers.Embedding(input_dim=max_tokens,
                              output_dim=embedding_size,
                              mask_zero=True),  # <pad> tokens => zero vectors
    tf.keras.layers.Lambda(compute_mean_embedding),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid"),
])

From my tests, it does not seem that mask_zero=True guarantees that the padding token 0 (output of the TextVectorization layer) is embedded to a zeros vector by the Embedding layer. So in fact, I'm not sure it correctly counts the number of words. If I am correct, it always uses the square root of the length of the longest sentence in the batch.

Here is my suggested solution. The layer should replace both the Embedding and Lambda layers in the notebook's solution. Note, however, that when I tested a model using this layer, it did not have a better accuracy than the notebook's solution for the same dataset and hyperparameters.

class SentenceEmbedding(tf.keras.layers.Layer):
    def __init__(self, input_dim, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.embedding_layer = tf.keras.layers.Embedding(
            input_dim=self.input_dim,
            output_dim=self.output_dim,
            mask_zero=True
        )

    def call(self, input):
        n_words = tf.math.count_nonzero(input, axis=-1, keepdims=True, dtype=tf.float32)
        embeddings_output = self.embedding_layer(input)
        return tf.reduce_sum(embeddings_output, axis=1) / tf.sqrt(n_words)

    def get_config(self):
        base_config = super().get_config()
        return {
            **base_config,
            'input_dim': self.input_dim,
            'output_dim': self.output_dim
        }