nipunsadvilkar/pySBD

Adding Multiple Language Support

nipunsadvilkar opened this issue · 6 comments

Add pysbd support for all the languages supported by pragmatic_segmenter

Will there be a Chinese version

@zlhcsm Yes, there will be

What about spanish support?

@nipunsadvilkar what approach are you planning for multi-language support?
I see that Pragmatic Segmenter has coverage for these languages:

https://github.com/diasks2/pragmatic_segmenter/tree/master/spec/pragmatic_segmenter/languages

Were you thinking of porting that over or taking a fresh approach?

Although I don't have significant spoken language skills (besides English!) if there's testing or some other basic task I could help with, I'd be interested in lending a hand

@nmstoker Thank you for the interest. I've been working adding support for other languages in PR #63 and have refactored the code to port the rest of the languages supported by pragamatic_segmenter into pysbd.

I myself know English, Hindi & Marathi language so adding support in PR #63. Will be updating with other languages in next few days

Hi! Great to see this port -- what is the current status of porting additional languages beyond English from the ruby version? I saw several languages that I'm interested in in the results from the NLP-OSS paper, but I noticed when testing some of the Japanese and Arabic examples from the ruby version README with the python code that I got different results. Thanks!