LangSmith Cookbook

Welcome to the LangSmith Cookbook — your practical guide to mastering LangSmith. While our standard documentation covers the basics, this repository delves into common patterns and some real-world use-cases, empowering you to optimize your LLM applications further.

This repository is your practical guide to maximizing LangSmith. As a tool, LangSmith empowers you to debug, evaluate, test, and improve your LLM applications continuously. These recipes dive deeper than the , presenting real-world scenarios for you to adapt and implement.

Your Input Matters

Help us make the cookbook better! If there's a use-case we missed, or if you have insights to share, please raise a GitHub issue (feel free to tag Will) or contact the LangChain development team. Your expertise shapes this community.

Tracing your code

Tracing allows for seamless debugging and improvement of your LLM applications. Here's how:

Tracing without LangChain: learn to trace applications independent of LangChain using the Python SDK's @traceable decorator.
REST API: get acquainted with the REST API's features for logging LLM and chat model runs, and understand nested runs. The run logging spec can be found in the LangSmith SDK repository.
Customing Run Names: improve UI clarity by assigning bespoke names to LangSmith chain runs—includes examples for chains, lambda functions, and agents.

LangChain Hub

Efficiently manage your LLM components with the LangChain Hub. For dedicated documenation, please see the hub docs.

RetrievalQA Chain: use prompts from the hub in an exampe RAG pipeline.
Prompt Versioning: ensure deployment stability by selecting specific prompt versions over the 'latest'.
Runnable PromptTemplate: streamline the process of saving prompts to the hub from the playground and integrating them into runnable chains.

Testing & Evaluation

Test and benchmark your LLM systems using methods in these evaluation recipes:

Python Examples

Q&A System Correctness: evaluate your retrieval-augmented Q&A pipeline on a dataset. Iterate, improve, and keep testing.
Evaluating Q&A Systems with Dynamic Data: use evaluators that dereference a labels to handle data that changes over time.
Comparison Evals: use labeled preference scoring to contrast system versions and determine the most optimal outputs.
You can incorporate LangSmith in your existing testing framework:
- LangSmith in Pytest benchmark your chain in pytest and assert aggregate metrics meet the quality bar.
- Unit Testing with Pytest: write individual unit tests and log assertions as feedback.
Evaluating Existing Runs: add ai-assisted feedback and evaluation metrics to existing run traces.
Naming Test Projects: manually name your tests with run_on_dataset(..., project_name='my-project-name')

TypeScript / JavaScript Testing Examples

Incorporate LangSmith into your TS/JS testing and evaluation workflow:

Evaluating JS Chains in Python: evaluate JS chains using custom python evalators, adapting methods from the "Evaluating Existing Runs" guide.
Logging Assertions as Feedback: convert CI test assertions into LangSmith feedback, enhancing trace visibility with minimal modifications.

Using Feedback

Harness user feedback and other signals to improve, monitor, and personalize your applications:

Streamlit Chat App: a minimal chat app that captures user feedback and shares traces of the chat application.
- The vanilla_chain.py contains an LLMChain that powers the chat application.
- The expression_chain.py contains an equivalent chat chain defined exclusively with LangChain expressions.
Next.js Chat App: explore a simple TypeScript chat app demonstrating tracing and feedback capture.
- You can check out a deployed demo version here.
Building an Algorithmic Feedback Pipeline: automate feedback metrics for advanced monitoring and performance tuning.

Exploratory Data Analysis

Turn your trace data into actionable insights:

Exporting LLM Runs and Feedback: extract and interpret LangSmith LLM run data, making them ready for various analytical platforms.
Lilac: enrich datasets using the open-source analytics tool, Lilac, to detect near-duplicates, check for PII, and more.

Exporting data for fine-tuning

Fine-tune an LLM on collected run data using these recipes:

OpenAI Fine-Tuning: list LLM runs and convert them to OpenAI's fine-tuning format efficiently.

CharlyWargnier/langsmith-cookbook