TonicAI/tonic_validate

Add multiple runs per question and report average/stdev

peter-did-mh opened this issue · 0 comments

Hello, please add the ability to have a fixed number of runs per question instead of 1 and report average and stdev of all metrics (perhaps min/max or some sort of a histogram as well). That would allow avoiding outliers in the testing process like network connection issues, LLM temperature effect etc.