hsiehjackson/RULER

Base vs Chat prompt question.

karansaxena opened this issue · 3 comments

I wanted to confirm my understanding of the setup -

We have this file for the template https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/constants.py

and this file to control base-vs-chat prompt https://github.com/hsiehjackson/RULER/blob/main/scripts/data/template.py

Is that the right understanding that the base vs the chat mode prompt differs only (relatively) slightly?

Also, do we do everything zero-shot (i.e. no in-context examples)?

Yes, the first file controlls the template for each task while the second file controlls the template for each model (chat or base). Base-vs-chat only differs slightly based on how the model is aligned with their corresponding chat template.

Variable tracking and common words extraction have one demonstration. You can find in the following:
https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/variable_tracking.py#L194-L198
https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/common_words_extraction.py#L93-L96

Got it. Along the same lines, I wanted to ask another question and not open another issue.

  • Why do we have validation and test sets? In other words, I understand that we report results on (500?) test-samples. How/where is validation set used?

We report results of 500 samples, and we don't have validation sets and test sets separately. The validation you found in the repo is only used for the naming of your generated dataset.