curiosity-ai/catalyst

SentenceDetector features extraction bug

Opened this issue · 2 comments

Describe the bug

features[16] = (current.ValueAsSpan.IsSentencePunctuation() && current == prev) ? _Hash_True_IequalIm1 : _Hash_False_IequalIm1;

In that line and next 3 used equality operator wich not implemented for IToken, so thats always false

features[21] = (next.Length == 0 && next.Value == SpecialToken.BOS) ? _Hash_True_Im2IsBOS : _Hash_False_Im2IsBOS;

Next token cannot be BOS and previous cannot be EOS when process document for sentences

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Still relevant