Any string that isn't a multiple of 4 causes an assert failure
intelliqua opened this issue · 2 comments
Hi,
Any string that isn't a multiple of 4 causes an assert failure at line 548 in models.py
"assert char_encoding.shape[1] % self.conv.stride[0] == 0"
stride is intialised to config.downsampling_rate (4) in modeling_canine.py in transformers lib.
Sample code causing assert failure (length of input string is 35):
from wtpsplit import WtP
wtp = WtP("wtp-canine-s-12l")
wtp.split("This is a test This is another test", lang_code="en")
Sample code that works (with added full-stop that makes the length of input string to become 36):
from wtpsplit import WtP
wtp = WtP("wtp-canine-s-12l")
wtp.split("This is a test This is another test.", lang_code="en")
oof that's a big one, sorry about that. It's a symptom of being lazy and only testing wtp-bert-mini
in CI.
It's fixed in v1.2.3, can you confirm it works now?
Thanks! That was quick. Yes, it is fixed 👍