This streamlip app receives a markdown table (rows: models, columns: evals) and visualizes these with a plotly radar chart.
In your conda environment, clone the repo, cd
into it and install the requirements:
git clone https://github.com/danbider/llm-eval-visualizer.git
cd llm-eval-visualizer
pip install -r requirements.txt
streamlit run app.py
The markdown table should look like this (column and row names are flexible):
| model_name | average | world_knowledge | commonsense_reasoning | language_understanding | symbolic_problem_solving | reading_comprehension |
|:-------------------------|----------:|------------------:|------------------------:|-------------------------:|---------------------------:|------------------------:|
| llama-30b | 0.508013 | 0.570561 | 0.521302 | 0.549439 | 0.321474 | 0.577292 |
| huggyllama/llama-13b | 0.428223 | 0.511058 | 0.464285 | 0.482423 | 0.23844 | 0.444907 |
| huggyllama/llama-7b | 0.351241 | 0.354118 | 0.396072 | 0.428827 | 0.182015 | 0.395171 |
| togethercomputer/RedPajama-INCITE-7B-Instruct | 0.354936 | 0.368793 | 0.367142 | 0.395898 | 0.210048 | 0.432801 |
| mosaicml/mpt-7b-instruct | 0.338077 | 0.338253 | 0.416911 | 0.371509 | 0.17265 | 0.391062 |
| mosaicml/mpt-7b | 0.310326 | 0.310191 | 0.384509 | 0.380392 | 0.162957 | 0.31358 |
| tiiuae/falcon-7b | 0.309822 | 0.272142 | 0.419968 | 0.369998 | 0.158363 | 0.328637 |
| togethercomputer/RedPajama-INCITE-7B-Base | 0.29738 | 0.312032 | 0.363261 | 0.3733 | 0.126577 | 0.311731 |
| tiiuae/falcon-7b-instruct | 0.28197 | 0.260288 | 0.370308 | 0.332523 | 0.107958 | 0.338774 |
| EleutherAI/pythia-12b | 0.274429 | 0.252255 | 0.344973 | 0.33249 | 0.136118 | 0.306308 |
| EleutherAI/gpt-j-6b | 0.268168 | 0.260849 | 0.330648 | 0.311813 | 0.120669 | 0.31686 |
| facebook/opt-6.7b | 0.24994 | 0.236678 | 0.326348 | 0.322621 | 0.0930295 | 0.271022 |
| EleutherAI/pythia-6.9b | 0.248811 | 0.218628 | 0.308817 | 0.304028 | 0.120792 | 0.291793 |
| stabilityai/stablelm-tuned-alpha-7b | 0.163522 | 0.129503 | 0.198957 | 0.20249 | 0.093985 | 0.192676 |