[Feature Request]: CRITIC, Standardizing Critique Few-Shots
Closed this issue · 0 comments
alckasoc commented
Feature Description
Figure 1. Number of few-shot examples per benchmark.
Benchmark | Number of critique few-shot examples | Matches Figure 1 | Issues |
---|---|---|---|
HotpotQA | 5 | ✅ | different examples used |
FEVER | 5 | ✅ | |
TriviaQA | 5 | ✅ | different examples used, different number of few-shot examples |
AmbigNQ | 5 | ✅ | different examples used, different number of few-shot examples, different order |
GSM8K | 5 | ✅ | different examples used, different number of few-shot examples |
SVAMP | 5 | ✅ | different examples used, different number of few-shot examples |
TabMWP | 5 | ✅ | different examples used, different number of few-shot examples |
MBPP | 5 | ✅ | different examples used, different number of few-shot examples |
HumanEval | 5 | ✅ (HumanEval is 0-shot, the convention for 0-shot is to have 5 examples) |
Table 1. Number of critique few-shot examples per benchmark.
For CRITIC, we have to craft critique prompts. These are different from the few-shot examples.
For every benchmark the 2 following criteria:
- The number of critique few-shot examples == the number of few-shot examples for that benchmark (Figure 1)
- For each benchmark, every example in the critique few-shot examples should use the same question as every example in the few-shot examples
The green checkmark in the "Matches Figure 1" column indicates both criteria are satisfied. The "Issues" column indicates why the 2 sets of benchmark few-shot examples don't match.
Note: Make sure to follow the prompt formatting.
Ways to Go About This
For any given benchmark:
- num few-shots > num critique few-shots
- write more critique examples
- ensure all critique examples match the examples (ordering matters and same questions used)
- you may have to replace existing critique examples and use the questions from the few-shots
- num few-shots == num critique few-shots AND no checkmark above (Table 1)
- replace existing critique examples with examples
- ensure ordering is the same too
- num few-shots < num critique few-shots
- remove some critique examples
- ensure all the examples use the same question and ordering matches