fully implement old evaluation
mwillsey opened this issue · 2 comments
mwillsey commented
fully implement old evaluation
chandrakananandi commented
Benchmarks that fail:
- card_org
chandrakananandi commented
will make new issue to investigate the failing tests