human3090: A Python repository from wakamex

Unless otherwise specified:

maximum number of layers offloaded to GPU
local models run with llama.cpp server and .gguf formats
parameter changes carried over into following tests (temperature, penalties, etc.)

Model	Configuration	Human Eval	Time taken
GPT-4*	Instruction-style, `temperature=0.2`, `presence_penalty=0`	63.4%
GPT-4*	Completion-style	84.1%
Mixtral8x7b	mixtral-8x7b-v0.1.Q5_K_M.gguf	45.7%
Mistral-medium*		62.2%
Llama2*	HF API, CodeLlama-34b-Instruct-hf	42.1%
Mistral-large*		73.2%
WizardLM2	WizardLM-2-8x22B.IQ3_XS-00001-of-00005.gguf	56.7%
Wizardcoder	wizardcoder-33b-v1.1.Q4_K_M.gguf, `temperature=0.0`	73.8%
Wizardcoder-Python	Q4_K_M. quant, Modified prompt	57.3%
CodeFuse-Deepseek	CodeFuse-DeepSeek-33B-Q4_K_M.gguf	68.7%
Deepseek	deepseek-coder-33b-instruct.Q4_K_M.gguf	79.9%
OpenCodeInterpreter	ggml-opencodeinterpreter-ds-33b-q8_0.gguf, -ngl 40	Failed
Deepseek	ggml-deepseek-coder-33b-instruct-q4_k_m.gguf	78.7%
Deepseek	deepseek-coder-33b-instruct.Q5_K_M.gguf, -ngl 60, a bit slow	79.3%
Llama3*	together.ai API, Llama-3-70b-chat-hf	75.6%
DBRX*	together.ai API, dbrx-instruct	48.8%
CodeQwen	codeqwen-1_5-7b-chat-q8_0.gguf	83.5%
Llama3-8B	bartowski/Meta-Llama-3-8B-Instruct-GGUF	52.4%
Phi-3-mini	4k context, 4bit quantized	60.4%
Phi-3-mini	4k context, fp16 quantized	62.2%
Hermes-Llama	Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-F16	53.7%
Codestral Q6_K	Codestral-22B-v0.1-hf.Q6_K.gguf	81.7%	812.53s
Codestral Q8	Codestral-22B-v0.1-hf.Q8_0.gguf	79.9%	2918.51s
Deepseek V2	DeepSeek-Coder-V2-Lite-Instruct-Q8_0_L.gguf	82.9%	378.86s

wakamex/human3090