FewDR

The repository is about our PRICAI 2024 work "Are Dense Retrieval Models Few-Shot Learners?".

Results

The results on FewDR benchmark can be found in FewDR-Results.csv.

The checkps and logging files will be available soon.

Requirements

The installation requirements here are the same as those for ANCE-Tele.

FewDR Dataset

The files of the FewDR dataset are as follows:

Filename	Description
tot_classes_pattern_60.txt	All classes' query template
queries.tsv	All queries
qid2num.json	The mapping file between qid and qid-num
queries_answers.jsonl	All queries and corresponding answers
tot-qid_query_answer_positive.jsonl	All queries, answers, and positive passages
wikipedia-corpus-index.tar.gz	Wikipedia corpus

tot-qid_query_answer_positive.jsonl contains the entire dataset information. One samples per line, formatted as follows:

"class": "P17",
"qid": "P17_97",
"qid-num": "24597",
"question": "which country is sharm el sheikh international airport located in",
"answers": ["egypt"],
"positive_ctxs": [
  {"title": "Sharm El Sheikh International Airport",
  "text": "Sharm El Sheikh International Airport is an international airport located in Sharm El Sheikh, Egypt. It is the third-busiest airport ...",
  "score": 1000,
  "passage_id": "9581501"}
  ],
}

FewDR Benchmark

The statistics of the FewDR dataset are as follows:

Split	Classes	Queries	#Train Qry	#Test Qry	Corpus Size
All	60	41,420	20,726	20,694	21,015,324
Base	30	20,668	10,341	10,327	21,015,324
Novel	30	20,752	10,385	10,367	21,015,324

The splitation strategy is written in the split-stg.json file:

{

"base":{
    "P20": { ## Base Class 1 ID
            "train": [qid-num_1, qid-num_2, ..., ], ## train qid-num list of P20
            "test":  [qid-num_3, qid-num_4, ..., ], ## test qid-num list of P20
           },
    "P22": { ## Base Class 2 ID
            "train": [qid-num_5, qid-num_6, ..., ], ## train qid-num list of P22
            "test":  [qid-num_7, qid-num_8, ..., ], ## test qid-num list of P22
           },
    ....

       },

"novel":{
    "P17": { ## Novel Class 1 ID
            "train": [qid-num_11, qid-num_12, ..., ], ## train qid-num list of P17
            "test":  [qid-num_13, qid-num_14, ..., ], ## test qid-num list of P17
            },
    "P19": { ## Novel Class 2 ID
            "train": [qid-num_15, qid-num_16, ..., ], ## train qid-num list of P19
            "test":  [qid-num_17, qid-num_18, ..., ], ## test qid-num list of P19
            },
    ......

        },
}

Expriments

Training

Before training, we should tokenize the training data. Here is a toy of tokenized FewDR base train data: train.json. The format of the tokenized data is as follows:

{
  "qid-num": "query id number in string format",
  "query": [train-query tokenized ids],
  "positives": [[positive-passage-1 tokenized ids], [positive-passage-2 tokenized ids], ...],
  "negatives": [[negative-passage-1 tokenized ids], [negative-passage-2 tokenized ids], ...],
}

After data preprocessing, we can enter the folder FewDR/shells and run the corresponding shell script to train your model.

For zero-shot train, use only our base-train data as train data and run following command:

bash zero-shot-train.sh

For full-shot train, use both base-train and novel-train data and run following command:

bash full-shot-train.sh

For few-shot train, set your zero-shot trained model as pretrained model, use both base-train and novel-train data and run following command:

bash few-shot-train.sh

Different few-shot seeds can be set in the few-shot-train.sh shell script. Here we use 5 different seeds (41,42,43,44,45).

Multi-GPU training is supported. Please keep the following hyperparameters unchanged and set --negatives_x_device when using multi-GPU setup.

Hyperparameters	Augments	Single GPU	E.g., Two GPUs
Qry Batch Size	--per_device_train_batch_size	32	16
(Positive + Negative) Passages per Qry	--train_n_passages	2	2
Learning rate	--learning_rate	5e-6	5e-6
Total training Epoch	--num_train_epochs	40	40

P.S. For more training & inference techniques, please see ANCE-Tele/README.md

Inference and Evaluation

After model training, we can use our shell scripts to build inference and make evaluation for our checkpoints. The format of test query and corpus used in our inference shell script is the same as ANCE-Tele (NQ):

(1) Zero-shot and Full-shot inference:

bash zero-full-shot-inference.sh

(2) Few-shot inference:

bash few-shot-inference.sh

Contact Us

For any question, feel free to create an issue, and we will try our best to solve. If the problem is more urgent, you can send an email to me at the same time 🤗.

NAME: Si Sun
EMAIL: sunsi.shining@gmail.com

SunSiShining/FewDR