/CL-ReLKT

The implementation of CL-ReLKT (NAACL-2022)

Primary LanguageJupyter Notebook

CL-ReLKT (Crosslingual-Retrieval Language Knowledge Transfer)

CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering, NAACL-2022 (Finding)

Screen Shot 2022-04-08 at 2 06 50 PM

Motivation

Cross-Lingual Retrieval Question Answering (CL-ReQA) is concerned with retrieving answer documents or passages to a question written in a different language. A common approach to CL-ReQA is to create a multilingual sentence embedding space such that question-answer pairs across different languages are close to each other.

In this paper, our goal is to improve the robustness of multilingual sentence embedding (yellow box) that works with a wide range of languages, including those with a limited amount of training data. Leveraging the generalizability of knowledge distillation, we propose a Cross-Lingual Retrieval Language Knowledge Transfer (CL-ReLKT) framework.

Multilingual Embedding Space Before & After performining the CL-ReLKT

Screen Shot 2022-04-08 at 3 29 12 PM

Paper

Link: https://aclanthology.org/2022.findings-naacl.165/

Citation

@inproceedings{limkonchotiwat-etal-2022-cl-relkt,
    title = "{CL-ReLKT}: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering",
    author = "Limkonchotiwat, Peerat  and
      Ponwitayarat, Wuttikorn  and
      Udomcharoenchaikit, Can  and
      Chuangsuwanich, Ekapol  and
      Nutanong, Sarana",
    booktitle = "Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022"
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Model and Datasets

How to train

Requirement

  • Tensorflow >= 2.5.0
  • tensorflow_text >= 2.5.0

Step1: Triplet loss warmup step

  • Run warmup.sh
  • In this step, we finetune the mUSE model with our training data (i.e., XORQA, MLQA, or XQUAD), where the anchor is the question, the positive is the answer to the question, and the negative is obtained from bm25.

Step2: Triplet loss online training

  • Run teacher.sh
  • In this step, we continue to finetune the model in Step 1 by using triplet loss and the concept of online mining (negative mining technique). 

Step3: Language Knowledge Transfer (Distillation)

  • Run distillation.sh
  • In this step, we initialize the model's weight from Step 2 and finetune it with the language knowledge transfer technique (Section 2.2).
  • We use 3 terms minimization such as question(English)-question(Non-English), document-document, document-question(non-English) as shown in the figure:

Screen Shot 2022-04-08 at 3 26 19 PM

Performance

Screen Shot 2022-04-08 at 3 00 54 PM

Screen Shot 2022-04-08 at 2 58 26 PM