PCKMT: A Python repository from wonderseen

Abstract

Source codes of ACL 2022-Efficient Cluster-Based k-Nearest-Neighbor Machine Translation.

The implement of our proposed PCKMT is build upon the research of:

adaptive kNN-MT (Xin Zheng et. al. 2021) [code]
Fairseq and Faiss developed by Facebook Research

Requirement

For our case, the CUDA version is 10.1. We didn't check other versions yet.

python >= 3.6
faiss-gpu == 1.6.5
torch == 1.5.0
torch-scatter == 2.0.5

With these requirements, it is suggested to use the command to install this editable version (fairseq == 0.10.1):

pip install --editable ./

Checkpoints

Our trained checkpoints, datastores and logs are provided: baidu (Password: ckmt)

Implement

Please follow the steps to reproduce experiments:

Follow the codebase of (Xin Zheng et. al. 2021) and download the checkpoint of base De-En NMT model released by Facebook WMT 2019.
Similarly, download the corpora and test sets as illustrated by Xin Zheng et. al. 2021.
Create the original datastore of adaptive kNN-MT.

cd codes && . create_datastore.sh

[Option] Modify the script prune_datastore.py to fit your datastore (e.g., datadir, datastore size, etc. in the main() function) and then prune the datastore:

python prune_datastore.py

Train the Compact Network:

. knn_align.sh

Reconstruct the compressed datastore of CKMT

. create_datastore_knn_align.sh

Train the quantized index

. build_faiss_index_knn_align.sh

Train the CKMT model

Run the training on 1 GPU

. train_faiss_knn_align.sh

Or run the training on multiply GPUs, when--

The training process causes OOM
The size of your datastore is too large, e.g. >100M tokens
The batch size is too large, e.g. >16 on P100

. train_faiss_knn_align_ddp.sh

The only difference of the DDP script is an external parameter:

options of 'faiss-batch-mode':
    'batch_large_faiss_large'
    'batch_large_faiss_small'
    'batch_small_faiss_small'
    'batch_small_faiss_large'

Evaluation

. test_adaptive_knn_mt_knn_align.sh

Updates

2022-05-12 see [issue #1 pckmt] which describes the minimal realization the via checkpoints downloading.
2022-05-22 see [Issue #2 pckmt] that summarizes empirical issues with respect to large-scale datastores.
2022-06-09 see support Meta-k network DDP training. Four options provided to fit different datastore/batch sizes.

Reference

If you use the source codes included here in your work, please cite the following paper: