google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
PythonCC-BY-4.0
Stargazers
- AmateurAcademicMunich, Germany
- arash2060
- balhafniNYU
- datta-TG
- duxiaochao
- fabiohtoSão Paulo
- fly51flyPRIS
- goto-yutaTokyo,Kyoto
- HazoomAsk-AI
- iibrahimliUniversity of Hamburg
- jgmizeTulsa, OK
- krishnaupadhyay3
- L-theorist
- LeonardBongardSuper Duper Geheimbasis
- max-yueFudan University
- nghuyong@Tencent
- panyang
- PrithivirajDamodaranBangkok
- qy826687054
- sabetAICohere.ai
- salujarohitFlowrite
- seongminpActionPower
- Shea-FyffeGeorge Mason University
- spurscoderBeijing
- stanleylsx@NetEase
- stjordanisGreece
- suhara@NVIDIA
- sundeeptekisundeepteki.org/course
- tillforevercn/sh
- trisongzGrowth Engine AI
- unhammerTrigram
- voidfulTaiwan
- WikidepiaIndonesia
- youichiroClassi
- zclflyPeking University
- ZhouJunyu9102