llm-evaluation-toolkit
There are 9 repositories under llm-evaluation-toolkit topic.
JohnSnowLabs/langtest
Deliver safe & effective language models
athina-ai/athina-evals
Python SDK for running evaluations on LLM generated responses
parea-ai/parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Re-Align/just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
zhuohaoyu/KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
scalexi/scalexi
scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).
Agenta-AI/job_extractor_template
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
parea-ai/parea-sdk-ts
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
EricLiclair/prayog-IndicInstruct
Indic evals for quantised models AWQ / GPTQ / EXL2