Adding recognition of Walloon (wa) language

Question

Adding recognition of Walloon (wa) language

Opened this issue 8 years ago · 4 comments

Hello,
I'm working on adding Walloon language to LanguageTool, which itself requires proper language detection from language-detector.
I don't see any clear instructions on how to generate a profile; so, as suggested, I'll attach some text files: http://chanae.walon.org/walon/wa.zip
It's a small zip file with some random pages from Wikipedia and rifondou.walon.org (for that last one, I only took texts more than 70 years old); it's about 2MB of text.
The zip include plain text dumps, as well as the html pages (which most often include, lang=... tags, in case it may be useful for you)

Another thing to know about Walloon, is that there are actually two ways of writting it.
A "unified orthography", called "rifondou" (which is the one used in those texts).
And a traditional "feller" one; which does a lot of emphasis on local accent and phonetic, with the consequence that is actually not one orthography, but a group of orthographies (at a very least there are four main groups: western, central, easter and south).

What would be the best thing to do:

only focus on "rifondou"
dump together all ways of writing the language
create several profiles (wa@rif, wa@ch, wa@na, wa@lg, wa@ba) ?

Thanks
wa.zip

Answer 1 · 2016-04-28T10:44:26.000Z

Ok, I managed to create it thanks to the help from rmtheis.
I did a pull request ( #50 ) with it.

Answer 2 · 2016-10-07T09:02:52.000Z

Thank you! Walloon is in now.
Can you tell us which way you went? Is the language profile only rifondou, or more?

Answer 3 · 2016-10-09T11:07:18.000Z

Thanks,
The pull request I did is only for normalized orthography ("rifondou").

Currently all the walloon language tools (like spell checker, the start of work in grammar tool LT), are in normalized orthography.
However, maybe having a tool to easily and automatically tell in which variant/dialect a text is written could be handy.
I'll a have a meeting this month and bring the topic to see what other people think about it.

Answer 4 · 2020-06-04T18:15:40.000Z

RE creating language profiles, instructions are at https://github.com/optimaize/language-detector/wiki/Creating-Language-Profiles