TREC CaST 2022 attempt for HEATWAVE-CAM

Set Up

clone this directory

 git clone https://github.com/usnistgov/trec_eval

Install the relevant dependencies
```
 pip install -r requirements.txt
```
In the data directory download the relevant data sets (further instructions can be found in the data directory) *Need to add corpora set up (MS MARCO, KILT, WaPo)
In the scripts directory run the 3 stage pipeline of query rewriting, BM25, and reranking
*Need to add BM25 implementation in src and create script for it
Evaluation of overall system performance can be done in the eval folder *Not tested yet- need correct Doc-IDs to test the script

Repository Structure

The main structure of the repository is as follows:

trec_cast_2022
└── src
│    └── modules
│    └── utils
└── scripts
│    └── run_rewrite
│    └── run_bm25
│    └── run_reranking
└── outputs
│    └── queries
│    └── bm25
│    └── reranking
└── eval
└── data

src: all the python code for the 3 stages
scripts: the scripts to process the data for each stage
outputs: where the output files of each stage is saved
data: where all the external data is stored (used by src)
eval: the evaluation scripts, used to measure overall end-to-end performance

Useful HPC Commands (Temp section for this week)

To load python3.8 into your session (to make the virtual environment) use the following command
```
 load module python3.8
```
To get exclusive access to a GPU machine (documentation)
```
 sintr -A GALES-SL4-GPU -p pascal -t 2:0:0 --exclusive
```
can also use -p ampere to request exclusive access to the newer A100 GPUs (but a bit amoral!)
Can alternatively request exclusive access to a CPU
```
 sintr -A GALES-SL4-CPU -p skylake -t 2:0:0 --exclusive
```
Other clusters can be requested in a similar way (e.g. with -p cclake)

adianliusie/trec_cast_2022

TREC CaST 2022 attempt for HEATWAVE-CAM

Set Up

Repository Structure

Useful HPC Commands (Temp section for this week)