/helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).

Primary LanguagePythonApache License 2.0Apache-2.0

Welcome! This repository contains all the assets for Holistic Evaluation of Language Models, which includes the following features:

  • Collection of datasets in a standard format (e.g., NaturalQuestions)
  • Collection of models accessible via a unified API (e.g., GPT-3, MT-NLG, OPT, BLOOM)
  • Collection of metrics beyond accuracy (efficiency, bias, toxicity, etc.)
  • Collection of perturbations for evaluating robustness and fairness (e.g., typos, dialect)
  • Modular framework for constructing prompts from datasets
  • Proxy server for managing accounts and providing unified interface to access models

To read more: