/gauntlet-visualizer

Primary LanguagePythonMIT LicenseMIT

Visualize LLM evals

This streamlip app receives a markdown table (rows: models, columns: evals) and visualizes these with a plotly radar chart.

Installation

In your conda environment, clone the repo, cd into it and install the requirements:

git clone https://github.com/danbider/llm-eval-visualizer.git
cd llm-eval-visualizer
pip install -r requirements.txt

Launching the app

streamlit run app.py

Usage

The markdown table should look like this (column and row names are flexible):

| model_name               |   average |   world_knowledge |   commonsense_reasoning |   language_understanding |   symbolic_problem_solving |   reading_comprehension |
|:-------------------------|----------:|------------------:|------------------------:|-------------------------:|---------------------------:|------------------------:|
| llama-30b    |  0.508013 |          0.570561 |                0.521302 |                 0.549439 |                   0.321474 |                0.577292 |
| huggyllama/llama-13b                |  0.428223 |          0.511058 |                0.464285 |                 0.482423 |                  0.23844   |                0.444907 |
| huggyllama/llama-7b |  0.351241 |          0.354118 |                0.396072 |                 0.428827 |                   0.182015 |                0.395171 |
| togethercomputer/RedPajama-INCITE-7B-Instruct |  0.354936 |          0.368793 |                0.367142 |                 0.395898 |                   0.210048 |                0.432801 |
| mosaicml/mpt-7b-instruct |  0.338077 |          0.338253 |                0.416911 |                 0.371509 |                   0.17265  |                0.391062 |
| mosaicml/mpt-7b          |  0.310326 |          0.310191 |                0.384509 |                 0.380392 |                   0.162957 |                0.31358  |
| tiiuae/falcon-7b         |  0.309822 |          0.272142 |                0.419968 |                 0.369998 |                   0.158363 |                0.328637 |
| togethercomputer/RedPajama-INCITE-7B-Base     |  0.29738  |          0.312032 |                0.363261 |                 0.3733   |                   0.126577 |                0.311731 |
| tiiuae/falcon-7b-instruct                     |  0.28197  |          0.260288 |                0.370308 |                 0.332523 |                   0.107958 |                0.338774 |
| EleutherAI/pythia-12b  |  0.274429 |          0.252255 |                0.344973 |                 0.33249  |                   0.136118 |                0.306308 |
| EleutherAI/gpt-j-6b                 |  0.268168 |          0.260849 |                0.330648 |                 0.311813 |                  0.120669  |                0.31686  |
| facebook/opt-6.7b                   |  0.24994  |          0.236678 |                0.326348 |                 0.322621 |                  0.0930295 |                0.271022 |
| EleutherAI/pythia-6.9b              |  0.248811 |          0.218628 |                0.308817 |                 0.304028 |                  0.120792  |                0.291793 |
| stabilityai/stablelm-tuned-alpha-7b |  0.163522 |          0.129503 |                0.198957 |                 0.20249  |                  0.093985  |                0.192676 |