Dataset like CoLA

Question

Dataset like CoLA

dragonnikkirocks opened this issue 4 years ago · 3 comments

Is there a dataset like https://arxiv.org/pdf/1901.03438.pdf for German? I want to use it for a grammar checker using BERT, but didn't find any.
Do you have any suggestions ?
Thanks in advance

Answer 1 · 2020-12-07T16:50:45.000Z

Hi, not to my knowledge but I'm not sure.

Answer 2 · 2020-12-07T23:42:04.000Z

Thanks for the reply. I am trying to make a spell checker for german using transformer as a downstream task. Do you have any suggestions on how I can approach this?
Thanks in advance

Answer 3 · 2020-12-08T08:26:48.000Z

What others have done in this situation was to train on artificial errors. Just take correct text and introduce some error. Of course that won't reflect real errors in every respect, but this is usually more than offset by being able to train on much more data.