Python Exception Types
polm opened this issue · 5 comments
I noticed that if input is too long in Python an Exception is thrown, but it's a plain Exception, not a ValueError
or something. I see in the Rust code there are a variety of specific error types.
I'm not familiar with Rust, but surely it's possible to have the Python code throw something more specific like a InputTooLongException
?
We will try to use better exception types later
is it possible to u16 -> u32 or more?
cant use for large text
@gulldan Your question is not actually related to the main issue here, which is not about length but types. You should open a new issue.
Separately, from experience, tokenizers like this aren't designed for long inputs like that and you should split yours up into multiple calls.
Adding to @polm, making the max input length to be u32::MAX will make it possible for Sudachi to crash with OOM because memory usage for long sentences will be very significant. In future it would be better to add an API for analyzing long text, as Java version has.
Also, getting to the original issue, I think that I changed all usages of Python Exception
type to SudachiError
during last 3-4 versions. The next version will fix last couple of usages.
SudachiError will be used instead.
Using a single generic error feels a little too general, but it's much better than a full Exception - thanks! I'll go ahead and close this.