segment-any-text/wtpsplit

Run models for Italian

RacheleSprugnoli opened this issue · 2 comments

Hi!
I would like more information on how to run WTPSPLIT for languages ​​other than English. In particular, I am interested in Italian.
Is the following command correct and enough?

sat_lora_it = SaT("sat-3l", style_or_domain="ud", language="it")

Thanks in advance,
Rachele

Hi, our models work the same way for all 85 supported languages. So, yes, using SaT as you outlined for Italian is perfectly fine. Depending on your use case "ersatz"/"opus100" are also appropriate, although UD should be a great (probably the best) starting point. Our "-sm" models should also work well across sentence styles (then there's no need for a lang code/style), but adapted models should be best! So the setup should be fine!
Hope this helps :)

Great! Thank you!