evals

There are 15 repositories under evals topic.

  • AgentOps-AI/agentops

    Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

    Language:Python2.2k25144214
  • METR/vivaria

    Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

    Language:TypeScript64427720
  • superlinear-ai/raglite

    🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite

    Language:Python641132
  • AIAnytime/rag-evaluator

    A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

    Language:Python242015
  • NirantK/rag-to-riches

    Language:Jupyter Notebook16106
  • evalica

    dustalov/evalica

    Evalica, your favourite evaluation toolkit

    Language:Python9321
  • openlayer-ai/templates

    Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.

    Language:Python6405
  • nstankov-bg/oaievals-collector

    The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!

    Language:Go3100
  • zeus-fyi/mockingbird

    Mockingbird Front End Code | Zeus + SciFi = Power of the gods (cloud + ai | Zeus) Meets the power of SciFi (human ingenuity | SfYi) At the intersection of intelligent design (systems engineering excellence) For your intelligence —ZeusFYI.

    Language:TypeScript2100
  • gokayfem/dspy-ollama-colab

    dspy with ollama and llamacpp on google colab

    Language:Jupyter Notebook1100
  • picturebooks

    lennart-finke/picturebooks

    Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.

    Language:Jupyter Notebook10
  • modelmetry/modelmetry-sdk-js

    The Modelmetry JS/TS SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.

    Language:TypeScript1100
  • modelmetry/modelmetry-sdk-python

    The Modelmetry Python SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.

    Language:Python1201
  • noah-art3mis/crucible

    Develop better LLM apps by testing different models and prompts in bulk.

    Language:Python1200
  • camronh/ContextLength-Experiment

    Gemini 1.5 Million Token Context Experiment

    Language:Jupyter Notebook0110