/llm-jeopardy

Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts

Primary LanguageJavaScriptMIT LicenseMIT

llm-jeopardy

Double

Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts

Install and run:
git clone https://github.com/aigoopy/llm-jeopardy.git
npm install
node . --help

llm-jeopardy framework uses llama.cpp for model execution and GGML models from Hugging Face. Updated with GGMLv3 models.

name percent modelcorrect modeltotal elapsed answerlen msize mdate
Airoboros-Gpt4-1.2 65B-8_0 82.52 255 309 36.865 12.50 69.37 2023/06/14 16:35:46
Airoboros-Gpt4-1.2 65B-5_1 81.88 253 309 36.596 12.64 48.97 2023/06/14 15:25:37
Airoboros-Gpt4-1.4 65B-8_0 79.94 247 309 42.750 21.01 69.37 2023/06/29 20:25:57
Airoboros-Gpt4-1.4 33B-8_0 76.70 237 309 22.375 18.16 34.56 2023/06/26 17:53:42
Alpaca-Lora 65B-5_1 74.11 229 309 44.778 35.37 48.97 2023/05/20 12:57:30
WizardLM 30B-8_0 74.11 229 309 53.883 214.90 34.56 2023/06/06 21:08:15
Hippogriff 30B-8_0 73.46 227 309 47.551 156.04 34.56 2023/05/31 09:16:01
Guanaco 65B-8_0 73.46 227 309 99.274 183.93 69.37 2023/05/26 08:46:34
GPT4-Alpaca-Lora 30B-8_0 73.14 226 309 48.790 158.55 34.56 2023/05/20 04:13:39
Wizard-Vicuna-Unc 30B-8_0 72.82 225 309 41.409 124.85 34.56 2023/05/30 04:33:26
WizardLM-Unc 30B-8_0 72.49 224 309 48.975 161.61 34.56 2023/05/22 14:34:25
WizardLM 30B-6_K 72.17 223 309 46.332 222.41 26.69 2023/06/06 19:03:43
Platypus 30B-8_0 71.84 222 309 24.628 22.94 34.56 2023/06/29 01:30:17
Chronoboros 33B-8_0 71.52 221 309 17.228 24.36 34.56 2023/07/10 09:16:27
GPlatty 30B-8_0 71.52 221 309 27.669 35.75 34.56 2023/06/29 00:01:01
Llama-Supercot 30B-8_0 71.52 221 309 35.496 92.76 34.56 2023/05/28 12:22:12
VicUnlocked-Alpaca 65B-8_0 71.52 221 309 99.011 172.73 69.37 2023/05/30 00:09:02
Samantha 33B-8_0 70.87 219 309 51.264 194.35 34.56 2023/05/29 10:18:08
SuperPlatty 30B-8_0 70.55 218 309 18.093 22.98 34.56 2023/07/03 21:07:50
Alpaca-Lora 30B-8_0 70.55 218 309 29.461 58.53 34.56 2023/06/01 07:50:56
Guanaco 65B-5_1 70.23 217 309 84.008 187.11 48.97 2023/05/25 18:58:18
GPT4-Alpaca-Lora-mlp 65B-5_1 69.90 216 309 73.475 149.61 48.97 2023/05/20 17:04:49
Airoboros-Gpt4-1.4 13B-8_0 69.58 215 309 11.106 17.40 13.83 2023/06/22 08:32:58
WizardLM-Unc-Supercot 30B-8_0 69.58 215 309 45.137 145.27 34.56 2023/06/01 11:07:15
Airoboros-Gpt4-1.2 13B-8_0 69.26 214 309 10.076 12.65 13.83 2023/06/16 13:03:21
Airochronos 33B-8_0 68.61 212 309 15.929 17.40 34.56 2023/07/10 22:07:12
Minotaur 13B-8_0 68.61 212 309 19.371 169.47 13.83 2023/06/08 21:45:25
WizardLM-1.0 13B-8_0 68.28 211 309 26.254 230.25 13.83 2023/05/27 16:17:01
Lazarus 30B-8_0 68.28 211 309 44.213 144.72 34.56 2023/06/07 15:58:57
Vicuna 33B-8_0 67.31 208 309 52.331 178.79 34.56 2023/06/30 16:08:40
Chronos-Hermes 13B-8_0 66.99 207 309 22.114 186.44 13.83 2023/06/13 11:02:08
WizardLM-Unc-1.0 13B-8_0 66.99 207 309 32.325 259.04 13.83 2023/06/20 07:44:48
Tulu 30B-8_0 66.67 206 309 21.475 19.09 34.56 2023/06/10 21:47:05
Nous-Hermes 13B-8_0 66.34 205 309 16.128 102.47 13.83 2023/06/03 13:44:45
WizardLM-1.1 13B-8_0 66.34 205 309 34.539 441.67 13.83 2023/07/07 16:35:42
Wlzard-Mega 13B-8_0 65.70 203 309 20.975 171.84 13.83 2023/05/20 03:50:25
Vicuna-1.3.0 13B-8_0 65.70 203 309 35.433 313.52 13.83 2023/06/25 11:15:58
Llama 30B-8_0 65.70 203 309 51.008 167.29 34.56 2023/05/20 19:50:17
Gpt4-X-Vicuna 13B-8_0 65.37 202 309 21.407 184.04 13.83 2023/05/20 05:02:06
Chronos-WizardLM-Unc-Sc 13B-8_0 65.37 202 309 23.468 202.76 13.83 2023/06/07 14:08:04
Mythologic 13B-8_0 65.05 201 309 16.863 172.50 13.83 2023/07/17 10:44:11
Baize-v2 13B-8_0 65.05 201 309 22.906 186.67 13.83 2023/05/24 12:00:06
Wizard-Vicuna-Unc 13B-8_0 64.72 200 309 15.160 93.71 13.83 2023/05/20 02:05:09
Airoboros-Gpt4-1.4 7B-8_0 64.40 199 309 6.896 20.88 7.16 2023/06/22 07:53:28
Chimera 13B-8_0 63.75 197 309 16.184 118.74 13.83 2023/06/03 13:08:37
Based 30B-8_0 63.75 197 309 23.381 35.11 34.56 2023/06/03 10:54:07
Guanaco 7B-8_0 63.43 196 309 34.267 701.59 7.16 2023/05/25 20:18:25
Hypermantis 13B-8_0 63.11 195 309 14.144 88.72 13.83 2023/06/03 00:38:54
Llama-2 13B-8_0 63.11 195 309 15.162 117.56 13.83 2023/07/18 17:36:27
Manticore 13B-8_0 63.11 195 309 18.142 133.34 13.83 2023/05/20 14:17:21
Airoboros-Gpt4-1.2 7B-8_0 61.81 191 309 6.580 16.73 7.16 2023/06/16 12:45:31
Selfee 13B-8_0 60.52 187 309 27.142 182.70 13.83 2023/06/06 14:23:41
Godzilla 30B-8_0 60.52 187 309 44.521 219.55 34.56 2023/07/09 12:43:22
Vigogne-Instruct 13B-8_0 60.19 186 309 19.048 126.03 13.83 2023/05/25 21:58:38
Wizard-Vicuna 13B-8_0 58.90 182 309 15.180 99.68 13.83 2023/05/20 02:44:04
WizardLM-Unc-1.0 7B-8_0 58.90 182 309 18.232 280.49 7.16 2023/06/18 12:59:11
Wizard-Vicuna-Unc 7B-8_0 56.31 174 309 8.254 80.20 7.16 2023/05/20 01:07:29
Baize-v2 7B-8_0 55.99 173 309 13.043 181.94 7.16 2023/05/24 11:38:45
Orca-mini-v2 7B-8_0 55.34 171 309 7.448 126.56 7.16 2023/07/04 08:19:54
UltraLM 13B-8_0 54.69 169 309 18.651 133.95 13.83 2023/06/29 21:37:25
Koala 13B-8_0 54.37 168 309 39.154 423.01 13.83 2023/05/20 05:33:31
VicUnlocked-LoRA 30B-8_0 53.40 165 309 80.434 261.73 34.56 2023/05/20 22:52:56
Llama-2 7B-8_0 53.07 164 309 8.793 139.08 7.16 2023/07/18 17:16:54
WizardLM 7B-8_0 52.43 162 309 13.444 191.54 7.16 2023/05/20 00:19:49
Koala 7B-8_0 51.78 160 309 24.340 458.86 9.76 2023/05/20 00:45:54
Open-Llama-Instruct 13B-8_0 50.49 156 309 19.378 124.00 13.83 2023/06/20 14:21:08
Open-Orca-Preview1 13B-8_0 49.51 153 309 7.705 29.33 13.83 2023/07/12 21:32:00
GPT4All-Snoozy 13B-8_0 47.25 146 309 14.646 92.30 13.83 2023/05/20 03:14:27