Dropout is not called in the training regime in TransformerEncoder and others
Closed this issue · 2 comments
Hi all,
Describe the bug
The TransformerEncoder
layer does not take the training
argument:
and so it does not pass it to the dropout layers:
If I understand it correctly, this means that the dropouts are never used.
However, there are many places which does not pass training
-- TransformerDecoder
and FNetEncoder
in layers
, and quite a few models in models
-- XLNetEncoder
, BloomDecoder
, GemmaDecoderBlock
, etc.
Note that the models which use functional API to create the models should be fine -- there the training
argument is passed automatically; however, if subclassing API is used (i.e., def call
is used), it is not passed.
Note that this is one of important differences between TF-Keras and Keras 3, because:
- in TF-Keras, if no
training
is passed, the value ofbackend.learning_phase
is used https://github.com/keras-team/tf-keras/blob/95be21afe33fe7d1dc0713ebf3bd4d211d94a065/tf_keras/layers/regularization/dropout.py#L108-L113 - however, in Keras 3, the default is to use
training=False
if not passed: https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/regularization/dropout.py#L57-L65
@fchollet I have taken the liberty of adding you here to verify this 🙇 (if I am correct, this will need a non-trivial effort to fix and finetuning on Keras 3 will give suboptimal results until then).
Oh, sorry, I just realized how that now works in Keras 3 🤦♂️ https://github.com/keras-team/keras/blob/ce06c6509db91f334168c66db2e7003101dcd749/keras/layers/layer.py#L743-L748
Closing.