LeCoRE (WWW 2023)

This is the temporary repository of our WWW 2023 submission: Learning Denoised and Interpretable Session Representation for Conversational Search

Running Environment

Main packages:

python 3.8.13
pytorch 1.10.1
transformers 4.21.2
numpy: 1.22.4

Our implementation is based on the excellent open-source SPLADE repository. Thanks to it!

Running Steps

1. Download and preprocess data.

The four used public datasets can be downloaded from QReCC, TopiOCQA, CAsT-19 and CAsT-20. Refer to the [preprocess folder] for data preprocessing and finally move all preprocessed data into a ''datasets'' folder.

2. Index passages

We use the pre-trained ad-hoc SPLADE model "naver/efficient-splade-V-large-doc", which can be downloaded in huggingface, to generate passage embeddings:

# Replacing $Dataset_name with "QReCC", "TopiOCQA" or "CAsT"
python index.py --dataset=$Dataset_name \
--collection_path=$Collection_path \
--pretrained_doc_encoder_path="naver/efficient-splade-V-large-doc" \
--output_index_dir_path=$output_index_dir_path \
--per_gpu_index_batch_size=256 \
--max_doc_length=256 \
--force_emptying_dir \

3. Train LeCoRE

We provide an example script for training LeCoRE on QReCC. Please run:

bash scripts/train.sh

4. Evaluate LeCoRE

We provide an example script for evaluating LeCoRE on QReCC. Please run:

bash scripts/test.sh 4