sillsdev/icu-dotnet

BreakIterator.GetBoundaries is exponentially slow depending on the size of the source text

atlastodor opened this issue · 1 comments

Describe the bug

BreakIterator.GetBoundaries is exponentially slow depending on the size of the source text. In other words, the larger the size of the text parameter string is, the slower the function is, and the curve is not linear.

To Reproduce

string content = "... some large text, about 100KB ... ";
BreakIterator.GetBoundaries(BreakIterator.UBreakIteratorType.WORD, new Locale("eng"), content, false); // Takes about 10 secs.

Expected behavior

The BreakIterator.GetBoundaries to finish within milliseconds.

Environment

  • OS: Windows 10
  • Exact version of icu.net 2.6.0
  • .NET Framework 4.7

Fixed by #128