Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Evaluations
This repo contains the code, data, and model interactions (prompts and model responses) that we used in our paper, Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Evaluations. Each task has its own README file with more details. You can install the dependencies via pip install -r requirements.txt
(we have noticed that this sometimes fails on a mac; try linux). We tested with Python 3.11 though it is likely that other versions would also work. Unless otherwise mentioned, all commands in the README files should be run from the root directory of this repo, i.e. here.
Rerunning our experiments requires that you have access to the relevant APIs (export the corresponding API keys to the environment variables OPENAI_API_KEY
, ANTHROPIC_API_KEY
, and PALM_API_KEY
). Nevertheless, if you want to reproduce our experiments, you can convert our provided model interactions files into a cache file, and then our query function will automatically reuse the cache content without making API calls. You should be able to reproduce all of our experiments this way, except for the natural language logic task where the author shared with us a non-public version of the dataset (see our paper for more details). To do this conversion, run
python create_cache.py {arithmetic,programming/execution,programming/generation,syntax,spatial,drawing,music/chords,music/melodies,chess,SET}/model_interactions
Using this cache, you should obtain the exact numbers from our paper, unless otherwise mentioned in the individual README files when some version of randomness is involved.
Note that the model interactions for the logic task needs to be downloaded separately, see its README.md
for more details. If you choose to do so, you should include the logic task in the above command as well.
A general note: you may see mentions of "controls" in our code/file names. You can mentally subsitute it to "CCC"---it was an old name for that.