llm-evaluation-toolkit

There are 8 repositories under llm-evaluation-toolkit topic.

  • athina-ai/athina-evals

    Python SDK for running evaluations on LLM generated responses

    Language:Python1585011
  • Re-Align/just-eval

    A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

    Language:Python67335
  • parea-ai/parea-sdk-py

    Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

    Language:Python43214
  • zhuohaoyu/KIEval

    [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

    Language:Python25223
  • scalexi/scalexi

    scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).

    Language:Python11101
  • parea-ai/parea-sdk-ts

    TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

    Language:TypeScript4011
  • Agenta-AI/job_extractor_template

    Template for an AI application that extracts the job information from a job description using openAI functions and langchain

    Language:Python3301
  • EricLiclair/prayog-IndicInstruct

    Indic evals for quantised models AWQ / GPTQ / EXL2

    Language:Python0000