benchmarking question
lalehsg opened this issue · 1 comments
lalehsg commented
Hi all, I am wondering if there's any comparison between Defog and Mixtral 8x7B Instruct model has been done. Thanks in advance!
rishsriv commented
Hi there! You can run a comparison on sql-eval, with the code below. IIRC, mistral-medium
via the Mistral API is the 8x7B model. On SQL-Eval, it is 63% accurate, compared to 90% for sqlcoder-7b-2
python -W ignore main.py \
-db postgres \
-o "results/results.csv" \
-g mistral \
-f "prompts/prompt_mistral.md" \
-m mistral-medium \
-p 5