NVIDIA/RULER

dataset argument for qa.py not specified

Closed this issue · 2 comments

In the sample command you specify for qa.py, you don't specify the dataset argument https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/qa.py#L58
and I am getting this error. Can you let me know what dataset should be ? I suppose you pass those somewhere when you run things end to end?

(long-context) vivekkaul@Viveks-MacBook-Pro synthetic % python qa.py \
    --save_dir=./ \
    --save_name=qa \
    --tokenizer_path=tokenizer.model \
    --tokenizer_type=hf \
    --max_seq_length=4096 \
    --tokens_to_generate=128 \
    --num_samples=10 \
    --template="Answer the question based on the given documents. Only give me the answer and do not output any other words.\n\nThe following are given documents.\n\n{context}\n\nAnswer the question based on the given documents. Only give me the answer and do not output any other words.\n\nQuestion: {query} Answer:"
usage: qa.py [-h] --save_dir SAVE_DIR --save_name SAVE_NAME [--subset SUBSET] --tokenizer_path TOKENIZER_PATH [--tokenizer_type TOKENIZER_TYPE]
             --max_seq_length MAX_SEQ_LENGTH --tokens_to_generate TOKENS_TO_GENERATE --num_samples NUM_SAMPLES [--pre_samples PRE_SAMPLES]
             [--random_seed RANDOM_SEED] --template TEMPLATE [--remove_newline_tab] --dataset DATASET
qa.py: error: the following arguments are required: --dataset

We generate our dataset using prepare.py in here. If you want to directly use qa.py, you can set --dataset squad or --dataset hotpotqa. We use both for RULER.

Thanks a lot!