Benchmark prompt summarizers with frontier LLMs (GPT-4o / Opus)
griff4692 opened this issue · 1 comments
griff4692 commented
See this section of the writeup
Inspiration
The goal is to test out different methods for taking an existing prompt and shortening it.
This just involves a prompt along the lines of "summarize this prompt".
Evaluation metrics:
- Compression Ratio
- Benchmark performance (Lm-eval-harness is integrated into gpt-fast).
- This should be tested across settings (RAG, n-shot, short prompt, long prompt, etc.)
griff4692 commented
De-prioritized