/diff-model-eval

Simpler harness for testing models with the CanItEdit dataset

Primary LanguagePython

Built to evaluate CanItEdit abilities with different output formats while keeping track of token counts and accuracy.

Moved to be within the diff-model repository.