ilinkaa/rl-llm-calibration-test

Attempt at replication of the parts of the paper "Language models (mostly) know what they know", on open datasets, and models.

Jupyter Notebook

LLM Calibration Benchmark.

This repository attempts to run benchmarks on some popular openly available language models.

Installation

Unit Tests

Running unit tests requires pytest module invoked as follows:

    python -m pytest test

Contianer

Published docker container can be used as starting point for model configuration.

https://hub.docker.com/repository/docker/aakarsh/llm_calibration/general