Preprocessor does not respect sequence_length
Closed this issue · 2 comments
52631 commented
Describe the bug
If I initialize a preprocessor from preset it does not respect the specified sequence length.
To Reproduce
In keras-nlp== 0.11.1, the preprocessor defaults to 512 regardless of specified length:
keras_nlp.models.BertPreprocessor.from_preset('bert_tiny_en_uncased', sequence_length=16)("The quick brown fox jumped.")
Expected behavior
In keras-nlp==0.8.2, the preprocess would respect specified length.
{'token_ids': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([ 101, 1996, 4248, 2829, 4419, 5598, 1012, 102, 0, 0, 0,
0, 0, 0, 0, 0], dtype=int32)>,
'segment_ids': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)>,
'padding_mask': <tf.Tensor: shape=(16,), dtype=bool, numpy=
array([ True, True, True, True, True, True, True, True, False,
False, False, False, False, False, False, False])>}
Additional context
In my case, this showed up as a large performance hit when migrating code to latest version. The performance penalty may be more subtle depending on the desired sequence length relative to the default value.
It seems the work around is to override the sequence length after initializing.
preprocessor = keras_nlp.models.BertPreprocessor.from_preset('bert_tiny_en_uncased', sequence_length=16)
preprocessor.sequence_length = 16
SamanehSaadat commented
Thanks for reporting this issue! I'll look into this!
SamanehSaadat commented
This issue is fixed in #1632.