Adding Multiple Language Support
nipunsadvilkar opened this issue · 6 comments
Add pysbd support for all the languages supported by pragmatic_segmenter
Will there be a Chinese version
@zlhcsm Yes, there will be
What about spanish support?
@nipunsadvilkar what approach are you planning for multi-language support?
I see that Pragmatic Segmenter has coverage for these languages:
https://github.com/diasks2/pragmatic_segmenter/tree/master/spec/pragmatic_segmenter/languages
Were you thinking of porting that over or taking a fresh approach?
Although I don't have significant spoken language skills (besides English!) if there's testing or some other basic task I could help with, I'd be interested in lending a hand
@nmstoker Thank you for the interest. I've been working adding support for other languages in PR #63 and have refactored the code to port the rest of the languages supported by pragamatic_segmenter into pysbd.
I myself know English, Hindi & Marathi language so adding support in PR #63. Will be updating with other languages in next few days
Hi! Great to see this port -- what is the current status of porting additional languages beyond English from the ruby version? I saw several languages that I'm interested in in the results from the NLP-OSS paper, but I noticed when testing some of the Japanese and Arabic examples from the ruby version README with the python code that I got different results. Thanks!