keras-team/keras-nlp

Dropout is not called in the training regime in TransformerEncoder and others

Closed this issue · 2 comments

Hi all,

Describe the bug
The TransformerEncoder layer does not take the training argument:

def call(self, inputs, padding_mask=None, attention_mask=None):

and so it does not pass it to the dropout layers:
x = self._self_attention_dropout(x)
x = self._feedforward_dropout(x)

If I understand it correctly, this means that the dropouts are never used.

However, there are many places which does not pass training -- TransformerDecoder and FNetEncoder in layers, and quite a few models in models -- XLNetEncoder, BloomDecoder, GemmaDecoderBlock, etc.

Note that the models which use functional API to create the models should be fine -- there the training argument is passed automatically; however, if subclassing API is used (i.e., def call is used), it is not passed.

Note that this is one of important differences between TF-Keras and Keras 3, because:

@fchollet I have taken the liberty of adding you here to verify this 🙇 (if I am correct, this will need a non-trivial effort to fix and finetuning on Keras 3 will give suboptimal results until then).

Oh, sorry, I just realized how that now works in Keras 3 🤦‍♂️ https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/layer.py#L743-L748

Closing.