/PLOME

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

Primary LanguagePythonApache License 2.0Apache-2.0

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021)

This repository provides the code and data of the work in ACL2021: PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction https://aclanthology.org/2021.acl-long.233.pdf

Requirements:

  • python3

  • tensorflow1.14

  • horovod

Instructions:

  • Finetune:

    train and evaluation file format: original sentence \t golden sentence

    step1: cd finetune_src ; 
    step2: download the pretrained PLOME model and corpus from https://drive.google.com/file/d/1aip_siFdXynxMz6-2iopWvJqr5jtUu3F/view?usp=sharing ;
    step3: sh start.sh
  • Pre-train

    step1: cd pre_train_src ;
    step2: sh gen_train_tfrecords.sh ;
    step3: sh start.sh

    Our pre-trained model: https://drive.google.com/file/d/1aip_siFdXynxMz6-2iopWvJqr5jtUu3F/view?usp=sharing