/ThoughtSource

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/

Primary LanguageJupyter NotebookMIT LicenseMIT

ThoughtSource⚡

A framework for the science of machine thinking

ThoughtSource is a central, open resource and community around data and tools related to chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and development.

ThoughtSource overview 3

Generate interpretable reasoning chains

ThoughtSource overview 1

(the example shown here was generated with the text-davinci-002 model)

Annotate, evaluate and improve

ThoughtSource overview 2

Roadmap

  1. Create a repository of chain-of-thought (CoT) datasets converted to a unified format. ✅
  2. Create a library for generating reasoning chains with a wide variety of large language models. ✅
  3. Create tools for diagnosing, annotating and evaluating CoT data and fostering empirical understanding.
  4. Create a conceptual model of different CoT reasoning styles and errors.
  5. Provide models fine-tuned on high-quality CoT data.
  6. Apply CoT reasoning to high-impact use-cases such as biomedical research or clinical decision making.

Code

Libraries

  • cot:
    • dataloader: Creating and processing of ThoughtSource datasets (based on the Hugging Face 🤗 Datasets library).
    • generate: Generating reasoning chains with a wide variety of language models (currently OpenAI and models on Hugging Face hub)
    • evaluate: Evaluate the performance of predictions extracted using generated reasoning chains
  • explanatory notebooks: Overview, Datasets, Model, Performance

Overview Notebook

Applications

  • dataset-viewer: Streamlit application for browsing ThoughtSource datasets
  • annotator: Web-based tool for annotating chain-of-thought data.

Demonstration of the annotator tool

The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.


Use the web-based annotator 📝


Current datasets

Datasets can be browsed online through the Dataset Viewer 🔎.

Our dataloaders allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the Hugging Face 🤗 Datasets format. We (sometimes extensively) post-processed the source datasets in different ways to create more coherent reasoning chains.


Datasets can be browsed online through the Dataset Viewer 🔎


General question answering

  • commonsense_qa: Multiple-choice commonsense knowledge question answering dataset (Talmor 2018, License: MIT). Reasoning chains from three different sources are included:

    • Human-generated reasoning chains derived from the ECQA dataset (Aggarwal 2021). Used as gold standard. License: Community Data License Agreements Sharing license 1.0.
    • AI-generated (few-shot prompting) reasoning chains from Wei 2022. Only available for validation split. License: Unknown
    • AI-generated (zero-shot prompting) generated reasoning chains from Kojima 2022. Only available for validation split. License: Unknown
  • strategy_qa: General-domain question-answering data from the StrategyQA dataset, reasoning chains are derived from original dataset. (Geva 2021). License: MIT.

    • Human-generated reasoning chains derived from the original dataset. Used as gold standard. License: MIT.
    • AI-generated (few-shot) reasoning chains from Wei 2022. Only available for train split. License: Unknown
    • AI-generated (zero-shot) generated reasoning chains from Kojima 2022. Only available for train split. License: Unknown
  • qed: General-domain question-answering data and justifications from the QED dataset (Lamm 2020). License: CC BY-SA 3.0.

Scientific question answering

  • worldtree: Scientific question-answering data from the WorldTree v2 dataset (Xie 2020). Human-generated reasoning chains derived from the original dataset. License: AI2 Mercury.
  • entailment_bank: Science exam questions with expert-authored explanations from the EntailmentBank dataset (Dalvi 2022). Human-generated reasoning chains derived from the original dataset. License: CC BY 4.0. (Note: significant overlap with worldtree v2)
  • open_book_qa: Scientific question-answering modeled after open book exams for assessing human understanding from the OpenBookQA dataset (Mihaylov 2018). Human-generated reasoning chains derived from the original dataset. License: Apache License 2.0.
  • med_qa: Free-form multiple-choice OpenQA dataset containing questions from medical board exams in US (USMLE), Mainland China and Taiwan. (Jin 2020). License: MIT.
    • AI-generated (zero-shot) reasoning chains derived from Liévin 2022. Only available for the test split, only US questions. License: Unknown.
  • medmc_qa: Multiple-Choice Question Answering dataset containing real-world medical entrance exam questions from the All India Institute of Medical Sciences (AIIMS PG) and National Eligibility cum Entrance Test (NEET PG). Only available for 1000 samples from the validation split. (Pal 2022). License: MIT.
    • AI-generated (zero-shot) reasoning chains derived from Liévin 2022. License: CC-BY.
  • pubmed_qa: QA dataset containing biomedical questions extracted from PubMed abstracts that can be answered with yes/no/maybe (Jin 2019). License: MIT.
    • AI-generated (zero-shot) reasoning chains derived from Liévin 2022. Only available for the test split. License: CC-BY.

Math word problems

  • aqua: Math word problems from the AQUA-RAT (Algebra Question Answering with Rationales) dataset (Ling 2017). Reasoning chains derived from the original dataset. License: Apache 2.0.
  • asdiv: Math word problems from the Academia Sinica Diverse MWP dataset (Miao 2020). Reasoning chains derived from the original dataset. License: CC BY-NC 4.0.
  • gsm8k: Math word problems from the GSM8K dataset (Cobbe 2021). Reasoning chains derived from the original dataset. License: MIT.
  • mawps: Math word problems from MAWPS, the Math Word Problem Repository dataset (Koncel-Kedziorski 2016). Reasoning chains derived from the original dataset. License: MIT.
  • svamp: Math word problems. Source: SVAMP (Patel 2021). Reasoning chains derived from the original dataset. License: MIT.

We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see dataset issues). We welcome suggestions for the inclusion of other datasets.

We welcome dataset contributions! 👉 Have a look at our contribution guide!