cogtoolslab/physics-benchmarking-neurips2021

Post-initial submission follow-up analyses of model/human behavior

Opened this issue · 0 comments

  • Extract those trials where humans: (a) consistently succeed; (b) are close to chance; (c) systematically fail.
    These can get passed into curiophysics for interestingness annotation.
  • More detailed error analysis: on which scenarios / instances did humans and models diverge the most?
    Extract those trials where Visualize some model predictions -- can we tell why some of the vision models fail/succeed some trials?
  • Which models’ behavior were most similar to which other models’?