/agentops-node

AgentOps SDK for Node (JS/TS)

Primary LanguageTypeScriptMIT LicenseMIT

AgentOps BETA๐Ÿ•ต๏ธ

AI agents suck. Weโ€™re fixing that.

Build your next agent with evals, observability, and replay analytics. AgentOps is the toolkit for evaluating and developing robust and reliable AI agents.

License: MIT

Quick Start

Install AgentOps npm install agentops

Add AgentOps to your code. Check out an example.

import OpenAI from "openai";
import { Client } from 'agentops';

const openai = new OpenAI();                        // Add your API key here or in the .env

const agentops = new Client({
    apiKey: "<Insert AgentOps API Key>",            // Add your API key here or in the .env
    tags: ["abc", "success"],                       // Optionally add tags to your run
    patchApi: [openai]                              // Record LLM calls automatically (Only OpenAI is currently supported)
});

// agentops.patchApi(openai)                        // Alternatively, you can patch API calls later

// Sample OpenAI call (automatically recorded if specified in "patched")
async function chat() {
    const completion = await openai.chat.completions.create({
        messages: [{ "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "Who won the world series in 2020?" },
        { "role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020." },
        { "role": "user", "content": "Where was it played?" }],
        model: "gpt-3.5-turbo",
    });

    return completion
}

// Sample other function
function orignal(x: string) {
    console.log(x);
    return 5;
}

// You can track other functions by wrapping the function.
const wrapped = agentops.wrap(orignal);
wrapped("hello");


chat().then(() => {
    agentops.endSession("Success"); // Make sure you end your session when your agent is done.
});

Time travel debugging ๐Ÿ”ฎ

(coming soon!)

Agent Arena ๐ŸฅŠ

(coming soon!)

Evaluations Roadmap ๐Ÿงญ

Platform Dashboard Evals
โœ… Python SDK โœ… Multi-session and Cross-session metrics ๐Ÿšง Evaluation playground + leaderboard
๐Ÿšง Evaluation builder API โœ… Custom event tag tracking  ๐Ÿ”œ Agent scorecards
โœ… Javascript/Typescript SDK ๐Ÿšง Session replays ๐Ÿ”œ Custom eval metrics

Debugging Roadmap ๐Ÿงญ

Performance testing Environments LAA (LLM augmented agents) specific tests Reasoning and execution testing
โœ… Event latency analysis ๐Ÿ”œ Non-stationary environment testing ๐Ÿ”œ LLM non-deterministic function detection ๐Ÿšง Infinite loops and recursive thought detection
โœ… Agent workflow execution pricing ๐Ÿ”œ Multi-modal environments ๐Ÿ”œ Token limit overflow flags ๐Ÿ”œ Faulty reasoning detection
๐Ÿ”œ Success validators (external) ๐Ÿ”œ Execution containers ๐Ÿ”œ Context limit overflow flags ๐Ÿ”œ Generative code validators
๐Ÿ”œ Agent controllers/skill tests ๐Ÿ”œ Honeypot and prompt injection evaluation ๐Ÿ”œ API bill tracking ๐Ÿ”œ Error breakpoint analysis
๐Ÿ”œ Information context constraint testing ๐Ÿ”œ Anti-agent roadblocks (i.e. Captchas)
๐Ÿ”œ Regression testing

Why AgentOps? ๐Ÿค”

Our mission is to make sure your agents are ready for production.

Agent developers often work with little to no visibility into agent testing performance. This means their agents never leave the lab. We're changing that.

AgentOps is the easiest way to evaluate, grade, and test agents. Is there a feature you'd like to see AgentOps cover? Just raise it in the issues tab, and we'll work on adding it to the roadmap.