/langfuse

๐Ÿชข Open source LLM observability, analytics, prompt management, evaluations, tests, monitoring, logging, tracing, LLMOps. ๐Ÿ•ต๏ธโ€โ™‚๏ธ Langfuse: the LLM engineering platform. Debug, analyze and iterate together - ๐ŸŠYC W23 ๐Ÿค– Stable SDKs + integrations for Typescript, Python, OpenAI, Langchain, Litellm, Flowise, Superagent and Langflow

Primary LanguageTypeScriptOtherNOASSERTION

Open source observability & analytics for LLM-based applications

Observability: Explore and debug complex logs & traces in a visual UI
Analytics: Measure & improve costs, latency and response quality

Join the Langfuse Discord ยป
langfuse.com ยท Docs ยท Report Bug ยท Feature Request

MIT License Discord Github Repo Stars langfuse releases CI test status Uptime Status Y Combinator W23 Docker Image langfuse npm package langfuse Python package on PyPi


What is Langfuse?

Langfuse is an open source observability & analytics solution for LLM-based applications. It is mostly geared towards production usage but some users also use it for local development of their LLM applications.

Langfuse is focused on applications built on top of LLMs. Many new abstractions and common best practices evolved recently, e.g. agents, chained prompts, embedding-based retrieval, LLM access to REPLs & APIs. These make applications more powerful but also unpredictable for developers as they cannot fully anticipate how changes impact the quality, cost and overall latency of their application. Thus Langfuse helps to monitor and debug these applications.

Demo (2 min)

langfuse_demo_2_min.mp4

Muted by default, enable sound for voice-over

Explore demo project in Langfuse here (free account required): https://langfuse.com/demo

Observability

Langfuse offers an admin UI to explore the ingested data.

  • Nested view of LLM app executions; detailed information along the traces on: latency, cost, scores
  • Segment execution traces by user feedback, to e.g. identify production issues

Analytics

Reporting on

  • Token usage by model
  • Volume of traces
  • Scores/evals

Broken down by

  • Users
  • Releases
  • Prompt/chain versions
  • Prompt/chain types
  • Time

โ†’ Expect releases with more ways to analyze the data over the next weeks.

Get started

Step 1: Run Server

Langfuse Cloud

Managed deployment by the Langfuse team, generous free-tier (hobby plan) available, no credit card required.

Links: Create account, learn more

Localhost

Requirements: docker, docker compose (e.g. using Docker Desktop)

# Clone repository
git clone https://github.com/langfuse/langfuse.git
cd langfuse

# Run server and database
docker compose up -d

Self-host (Docker)

โ†’ Instructions

Deploy on Railway

Step 2: Data ingestion

SDKs to instrument application

Fully async, typed SDKs to instrument any LLM application. Currently available for Python & JS/TS.

โ†’ Guide with an example of how the SDK can be used

Package Description Links
PyPI Version Python docs, repo
npm Version JS/TS: Node >= 18, Edge runtimes docs, repo
npm package JS/TS: Node <18 docs, repo

Langchain applications

The Langfuse callback handler automatically instruments Langchain applications. Currently available for Python and JS/TS.

Python

pip install langfuse
# Initialize Langfuse handler
from langfuse.callback import CallbackHandler
handler = CallbackHandler(PUBLIC_KEY, SECRET_KEY)

# Setup Langchain
from langchain.chains import LLMChain
...
chain = LLMChain(llm=llm, prompt=prompt)

# Add Langfuse handler as callback
chain.run(input="<user_input", callbacks=[handler])

โ†’ Langchain integration docs for Python

JS/TS

โ†’ Langchain integration docs for JS/TS

Add scores/evaluations to traces (optional)

Quality/evaluation of traces is tracked via scores (docs). Scores are related to traces and optionally to observations. Scores can be added via:

  • Backend SDKs (see docs above): {trace, event, span, generation}.score()

  • API (see docs below): POST /api/public/scores

  • Client-side using Web SDK, e.g. to capture user feedback or other user-based quality metrics:

    npm install langfuse
    // Client-side (browser)
    
    import { LangfuseWeb } from "langfuse";
    
    const langfuseWeb = new LangfuseWeb({
      publicKey: process.env.LANGFUSE_PUBLIC_KEY,
    });
    
    // frontend handler (example: React)
    export function UserFeedbackComponent(props: { traceId: string }) {
      const handleUserFeedback = async (value: number) => {
        await langfuseWeb.score({
          traceId: props.traceId,
          name: "user_feedback",
          value,
        });
      };
      return (
        <div>
          <button onClick={() => handleUserFeedback(1)}>๐Ÿ‘</button>
          <button onClick={() => handleUserFeedback(-1)}>๐Ÿ‘Ž</button>
        </div>
      );
    }

API

Api reference

  • POST/PATCH routes to ingest data
  • GET routes to use data in downstream applications (e.g. embedded analytics)

Questions / Feedback

The maintainers are very active in the Langfuse Discord and are happy to answer questions or discuss feedback/ideas regarding the future of the project.

Contributing to Langfuse

Join the community on Discord.

To contribute, send us a PR, raise a GitHub issue, or email at contributing@langfuse.com

Development setup

See CONTRIBUTING.md for details on how to setup a development environment.

License

Langfuse is MIT licensed, except for ee/ folder. See LICENSE and docs for more details.

Misc

Upgrade Langfuse (localhost)

# Stop server and db
docker compose down

# Pull latest changes
git pull
docker-compose pull

# Run server and db
docker compose up -d

Run Langfuse in CI for integration tests

Checkout GitHub Actions workflows of Python SDK and JS/TS SDK.

Telemetry

By default, Langfuse automatically reports basic usage statistics to a centralized server (PostHog).

This helps us to:

  1. Understand how Langfuse is used and improve the most relevant features.
  2. Track overall usage for internal and external (e.g. fundraising) reporting.

None of the data is shared with third parties and does not include any sensitive information. We want to be super transparent about this and you can find the exact data we collect here.

You can opt-out by setting TELEMETRY_ENABLED=false.