evals
There are 20 repositories under evals topic.
AgentOps-AI/agentops
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
lmnr-ai/lmnr
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
superlinear-ai/raglite
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
METR/vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
keshik6/HourVideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
AIAnytime/rag-evaluator
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
dustalov/evalica
Evalica, your favourite evaluation toolkit
flexpa/llm-fhir-eval
Benchmarking Large Language Models for FHIR
openlayer-ai/templates
Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.
The-Swarm-Corporation/StatisticalModelEvaluator
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
nstankov-bg/oaievals-collector
The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!
noah-art3mis/crucible
Develop better LLM apps by testing different models and prompts in bulk.
VikramxD/pixelupbench
Benchmarking Pixel based AI Upscaling Models for Video Upscaling
zeus-fyi/mockingbird
Mockingbird Front End Code | Zeus + SciFi = Power of the gods (cloud + ai | Zeus) Meets the power of SciFi (human ingenuity | SfYi) At the intersection of intelligent design (systems engineering excellence) For your intelligence —ZeusFYI.
gokayfem/dspy-ollama-colab
dspy with ollama and llamacpp on google colab
lennart-finke/picturebooks
Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.
modelmetry/modelmetry-sdk-js
The Modelmetry JS/TS SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.
modelmetry/modelmetry-sdk-python
The Modelmetry Python SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.
camronh/ContextLength-Experiment
Gemini 1.5 Million Token Context Experiment