Reasoning about Ambiguous Definite Descriptions

This repository contains the code used for data collection and experimentation.

The benchmark data and answers given by each model can be found under benchmark/.
The jsonl files contain a JSON object on each line with the following format:

{
  "label_nr":      1|2,                  The option corresponding to the correct answer.
  "label_name":    "de dicto"|"de re",   Correct answer class name. 
  "messages":      ["..."],              The list of messages to be sent to the LLM (one in case of direct prompting, two in case of chain-of-thought prompting.)
  "entity":        "...",                The name of the 'main entity'.
  "property":      "...",                The property ascribed to the definite description.
  "prompt_style":  "...",                A key into PROMPT_DICTIONARY in create_fragment.py . 

The above elements are just the benchmark itself, below the fields corresponding to the answers given by the models.

  "responses":     ["..."],              The replies given by the LLMs in response to the messages.
  "results": {
    "choice":      1|2 ,                 The option chosen by the model.
    "explanation": "..."                 The explanation provided by the model for why it chose the option that it did.
  }
}