Suggestion in the direction of OpenAI integration
UltimatePea opened this issue · 1 comments
Thank you for putting together this work.
OpenAI has announced open-evals https://github.com/openai/evals, where openai will improve user-submitted benchmarks. Perhaps we could submit a benchmark as follows: given an arbitrary grammar (could be randomly generated), the AI is able to synthesize sentences in that grammar and to judge whether a particular sentence conforms to the grammar.
That would be a "progress tracker" for the full syntactic reasoning capabilities with GPT-4 in parallel with official APIs.
Thanks – I agree this could make an interesting evals project; indeed, I've been thinking of submitting some (admittedly different) evaluations to OpenAI.
Feel free to submit a pull request if you have any ideas!