OpenPO simplifies building synthetic datasets for preference tuning from 200+ LLMs.
Resources | Notebooks |
---|---|
Building dataset with OpenPO and PairRM | 📔 Notebook |
Using OpenPO with Prometheus 2 | 📔 Notebook |
Evaluating with LLM-as-a-Judge | 📔 Notebook |
OpenPO is an open source library that simplifies the process of building synthetic datasets for LLM preference tuning. By collecting outputs from 200 + LLMs and synthesizing them using research-proven methodologies, OpenPO helps developers build better, more fine-tuned language models with minimal effort.
-
🤖 Multiple LLM Support: Collect diverse set of outputs from 200+ LLMs
-
📊 Research-Backed Evaluation Methods: Support for state-of-art evaluation methods for data synthesis
-
💾 Flexible Storage: Out of the box storage providers for HuggingFace and S3.
OpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:
pip install openpo
Clone the repository first then run the follow command
cd openpo
poetry install
set your environment variable first
# for completions
export HF_API_KEY=<your-api-key>
export OPENROUTER_API_KEY=<your-api-key>
# for evaluations
export OPENAI_API_KEY=<your-openai-api-key>
export ANTHROPIC_API_KEY=<your-anthropic-api-key>
To get started with collecting LLM responses, simply pass in a list of model names of your choice
Note
OpenPO requires provider name to be prepended to the model identifier.
import os
from openpo import OpenPO
client = OpenPO()
response = client.completions(
models = [
"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
"huggingface/mistralai/Mistral-7B-Instruct-v0.3",
"huggingface/microsoft/Phi-3.5-mini-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
)
You can also call models with OpenRouter.
# make request to OpenRouter
client = OpenPO()
response = client.completions(
models = [
"openrouter/qwen/qwen-2.5-coder-32b-instruct",
"openrouter/mistralai/mistral-7b-instruct-v0.3",
"openrouter/microsoft/phi-3.5-mini-128k-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
)
OpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.
response = client.completions(
models = [
"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct",
"huggingface/mistralai/Mistral-7B-Instruct-v0.3",
"huggingface/microsoft/Phi-3.5-mini-instruct",
],
messages=[
{"role": "system", "content": PROMPT},
{"role": "system", "content": MESSAGE},
],
params={
"max_tokens": 500,
"temperature": 1.0,
}
)
OpenPO offers various ways to synthesize your dataset. To run evaluation, first install extra dependencies by running
pip install openpo[eval]
To use single judge to evaluate your response data, use evaluate.eval
client = OpenPO()
res = openpo.evaluate.eval(
models=['openai/gpt-4o'],
questions=questions,
responses=responses,
)
To use multi judge, pass multiple judge models
res_a, res_b = openpo.evaluate.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
# get consensus for multi judge responses.
result = openpo.evaluate.get_consensus(
eval_A=res_a,
eval_B=res_b,
)
OpnePO supports batch processing for evaluating large dataset in a cost-effective way.
Note
Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).
info = openpo.batch.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
# check status
status = openpo.batch.check_status(batch_id=info.id)
For multi-judge with batch processing:
batch_a, batch_b = openpo.batch.eval(
models=["openai/gpt-4o", "anthropic/claude-sonnet-3-5-latest"],
questions=questions,
responses=responses,
)
result = openpo.batch.get_consensus(
batch_A=batch_a_result,
batch_B=batch_b_result,
)
You can use pre-trained open source evaluation models. OpenPo currently supports two types of models: PairRM
and Prometheus2
.
Note
Appropriate hardware with GPU and memory is required to make inference with pre-trained models.
To use PairRM to rank responses:
from openpo import PairRM
pairrm = PairRM()
res = pairrm.eval(prompts, responses)
To use Prometheus2:
from openpo import Prometheus2
from openpo.resources.provider import VLLM
model = VLLM<(model="prometheus-eval/prometheus-7b-v2.0")
pm = Prometheus2(model=model)
feedback = pm.eval_relative(
instructions=instructions,
responses_A=response_A,
responses_B=response_B,
rubric='reasoning',
)
Use out of the box storage class to easily upload and download data.
from openpo.storage import HuggingFaceStorage
hf_storage = HuggingFaceStorage()
# push data to repo
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
hf_storage.push_to_repo(repo_id="my-hf-repo", data=preference)
# Load data from repo
data = hf_storage.load_from_repo(path="my-hf-repo")
Contributions are what makes open source amazingly special! Here's how you can help:
- Clone the repository
git clone https://github.com/yourusername/openpo.git
cd openpo
- Install Poetry (dependency management tool)
curl -sSL https://install.python-poetry.org | python3 -
- Install dependencies
poetry install
- Create a new branch for your feature
git checkout -b feature-name
- Submit a Pull Request
- Write a clear description of your changes
- Reference any related issues