/PCKMT

Source codes of ACL 2022-Efficient Cluster-based k-Nearest-Neighbor Machine Translation

Primary LanguagePythonMIT LicenseMIT

Abstract

Source codes of ACL 2022-Efficient Cluster-Based k-Nearest-Neighbor Machine Translation.

The implement of our proposed PCKMT is build upon the research of:

  • adaptive kNN-MT (Xin Zheng et. al. 2021) [code]
  • Fairseq and Faiss developed by Facebook Research

Requirement

For our case, the CUDA version is 10.1. We didn't check other versions yet.

  • python >= 3.6
  • faiss-gpu == 1.6.5
  • torch == 1.5.0
  • torch-scatter == 2.0.5

With these requirements, it is suggested to use the command to install this editable version (fairseq == 0.10.1):

pip install --editable ./

Checkpoints

Our trained checkpoints, datastores and logs are provided: baidu (Password: ckmt)

Implement

Please follow the steps to reproduce experiments:

  1. Follow the codebase of (Xin Zheng et. al. 2021) and download the checkpoint of base De-En NMT model released by Facebook WMT 2019.
  2. Similarly, download the corpora and test sets as illustrated by Xin Zheng et. al. 2021.
  3. Create the original datastore of adaptive kNN-MT.
cd codes && . create_datastore.sh
  1. [Option] Modify the script prune_datastore.py to fit your datastore (e.g., datadir, datastore size, etc. in the main() function) and then prune the datastore:
python prune_datastore.py
  1. Train the Compact Network:
. knn_align.sh
  1. Reconstruct the compressed datastore of CKMT
. create_datastore_knn_align.sh
  1. Train the quantized index
. build_faiss_index_knn_align.sh
  1. Train the CKMT model

Run the training on 1 GPU

. train_faiss_knn_align.sh

Or run the training on multiply GPUs, when--

  • The training process causes OOM
  • The size of your datastore is too large, e.g. >100M tokens
  • The batch size is too large, e.g. >16 on P100
. train_faiss_knn_align_ddp.sh

The only difference of the DDP script is an external parameter:

options of 'faiss-batch-mode':
    'batch_large_faiss_large'
    'batch_large_faiss_small'
    'batch_small_faiss_small'
    'batch_small_faiss_large'
  1. Evaluation
. test_adaptive_knn_mt_knn_align.sh

Updates

  • 2022-05-12 see [issue #1 pckmt] which describes the minimal realization the via checkpoints downloading.

  • 2022-05-22 see [Issue #2 pckmt] that summarizes empirical issues with respect to large-scale datastores.

  • 2022-06-09 see support Meta-k network DDP training. Four options provided to fit different datastore/batch sizes.

Reference

If you use the source codes included here in your work, please cite the following paper:

@misc{https://doi.org/10.48550/arxiv.2204.06175,
  doi = {10.48550/ARXIV.2204.06175},
  url = {https://arxiv.org/abs/2204.06175},
  author = {Wang, Dexin and Fan, Kai and Chen, Boxing and Xiong, Deyi},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Efficient Cluster-Based k-Nearest-Neighbor Machine Translation},
  publisher = {arXiv},  
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}