Dataset like CoLA
dragonnikkirocks opened this issue · 3 comments
dragonnikkirocks commented
Is there a dataset like https://arxiv.org/pdf/1901.03438.pdf for German? I want to use it for a grammar checker using BERT, but didn't find any.
Do you have any suggestions ?
Thanks in advance
adbar commented
Hi, not to my knowledge but I'm not sure.
dragonnikkirocks commented
Thanks for the reply. I am trying to make a spell checker for german using transformer as a downstream task. Do you have any suggestions on how I can approach this?
Thanks in advance
zesch commented
What others have done in this situation was to train on artificial errors. Just take correct text and introduce some error. Of course that won't reflect real errors in every respect, but this is usually more than offset by being able to train on much more data.