adbar/German-NLP

Dataset like CoLA

dragonnikkirocks opened this issue · 3 comments

Is there a dataset like https://arxiv.org/pdf/1901.03438.pdf for German? I want to use it for a grammar checker using BERT, but didn't find any.
Do you have any suggestions ?
Thanks in advance

adbar commented

Hi, not to my knowledge but I'm not sure.

Thanks for the reply. I am trying to make a spell checker for german using transformer as a downstream task. Do you have any suggestions on how I can approach this?
Thanks in advance

zesch commented

What others have done in this situation was to train on artificial errors. Just take correct text and introduce some error. Of course that won't reflect real errors in every respect, but this is usually more than offset by being able to train on much more data.