Support for more languages
trenslow opened this issue · 3 comments
Hi, just want to say thanks for the work so far, this is a really handy tool :)
In the README of this repo, there is a link to LibreOffice dictionaries. If I follow that link, I see dictionaries for a few languages that are not supported by Pyphen so far. Examples include Arabic, Turkish, etc.
I was wondering if there is any reason why those languages are not supported? I see here that you're pulling from a different repo to add dictionaries, so maybe you have some other criteria for including languages that I'm not seeing.
It would be great if we could support the languages from the link on the README page. I know someone was asking for Arabic already.
Thanks and keep up the good work :)
Hi!
Hi, just want to say thanks for the work so far, this is a really handy tool :)
Thank you!
I was wondering if there is any reason why those languages are not supported?
The Arabic and Turkish dictionaries, for example, are not hyphenation dictionaries, they’re dictionaries for spellchecking, that’s why we don’t include them 😄. If you can find hyphenation dictionaries with open source licenses for other languages, don’t hesitate to open a new issue or a pull request so that we can include them!
What exactly makes a dictionary a hyphenation dictionary as opposed to a spell-checking dictionary?
What exactly makes a dictionary a hyphenation dictionary as opposed to a spell-checking dictionary?
Spellchecking dictionaries contain information about how words have to be written (here is a documentation of the format). Hyphenation dictionaries contain information about how words can be split (here is a documentation of the format).
These 2 problems seem to be simple but are actually quite complex, and not 100% related to each other: for example, hyphenation rules can be applied to words that are not in the spellchecking dictionaries (proper nouns, neologisms, etc.) That’s why we don’t have a unique dictionary containing both for each language.