/ugec

The official code for the "Unsupervised Grammatical Error Correction Rivaling Supervised Methods" paper, published in EMNLP 2023.

Primary LanguagePython

Unsupervised Grammatical Error Correction Rivaling Supervised Methods

Hannan Cao, Liping Yuan, Yuchen Zhang, Hwee Tou Ng. Unsupervised Grammatical Error Correction Rivaling Supervised Methods. In EMNLP 2023.

Training Data & Checkpoints

GEC training data; GEC model checkpoints;

English GEC

Flan-T5-xxl

  1. Please store all the downloaded checkpoint and data for Flan-T5-xxl in this folder: en_flan_t5/llm_finetune
  2. Install the requirement.txt inside en_flan_t5 folder

Train:

bash train.sh

Inference: go to en_flan_t5/llm_inference folder

bash eval_gec.sh your/ckpt/name

BART-base

  1. Please store all the downloaded checkpoint and data for BART-base in this folder: en_fairseq_train
  2. Install the requirement.txt inside en_fairseq_train folder

Train:

cd gec
bash train.sh path/to/the/model/to/be/restored path/to/data-bin/folder output_path

Inference:

bash new_generate.sh path/to/model/ckpt testing/input/path

Chinese GEC

  1. Please store all the downloaded checkpoint and data for BART-base in this folder: chinese_bart_large
  2. Install the requirement.txt inside chinese_bart_large folder

Train:

cd gec
bash train_ch.sh

Inference:

cd gec
bash test_ch.sh

Citation

If you found our paper or code useful, please cite as:

@inproceedings{cao-etal-2023-unsupervised,
    title = "Unsupervised Grammatical Error Correction Rivaling Supervised Methods",
    author = "Cao, Hannan  and
      Yuan, Liping  and
      Zhang, Yuchen  and
      Ng, Hwee Tou",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.185",
    doi = "10.18653/v1/2023.emnlp-main.185",
    pages = "3072--3088",
}

If you encounter any problem with the code, please contact caoh@u.nus.edu .