google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
PythonCC-BY-4.0
Issues
- 4
Dataset
#9 opened by saramoeini20 - 1
Any plan to release extra code?
#8 opened by lxsyz - 1
Dataset used to train the corruption model
#7 opened by GokulNC - 1
- 0
Please collaborate.
#1 opened by MariasStory - 2
Questions about reproducing the results
#4 opened by MichaelCaohn - 10