OpenPipe/ART

Add @art.rollout decorator to gather trajectories

Opened this issue · 6 comments

Proposal

  • Create an @art.rollout decorator which wraps a rollout function and constructs a trajectory (potentially with multiple histories) automatically, similar to how @weave.op automatically wraps an LLM-enabled function and records all function calls then reports a trace.
  • Allow rollout functions to access the current trajectory through some kind of get_current_trajectory() helper function.
  • Store completion ids on messages to make it possible to access and manipulate a certain history using trajectory.get_history(completion_id)
    • Useful when adding tool messages after executing a tool
  • Also create a gather_trajectory helper function that calls a rollout function decorated with @art.rollout and returns the generated trajectory.

Worth taking a good look at @weave.op, and we may even want to integrate with them or wrap their decorator since they've already done the integration work to read completions through a lot of LLM clients.

Messy ideas in proposal doc.

Caveats

This @art.rollout decorator will need to automatically determine when LLM completions are part of the same history or separate histories.

Example

Our current rollout functions require the user to initialize and add messages to an art.Trajectory object, like so:

async def get_summary(model: art.Model, scenario: Scenario) -> art.Trajectory:
    traj = art.Trajectory(
        messages_and_choices=[
            {
                "role": "system",
                "content": f"Summarize: {scenario.text}"
            },
        ]
    )

    completion = await client.chat.completions.create(
        model=model.name,
        messages=traj.messages()
    )

    traj.messages_and_choices.append(completion.choices[0])

    return traj

However, this makes our rollout functions verbose (because they have to initialize and update the trajectories) and difficult to use elsewhere in the codebase (because they don't return the processed type that the rollout function was meant to generate).

By decorating our function with @art.rollout and returning the summary as a string, our code will be made much cleaner:

@art.rollout
async def get_summary(model: art.Model, scenario: Scenario) -> str:
    completion = await client.chat.completions.create(
        model=model.name,
        messages=[
            {
                "role": "system",
                "content": f"Summarize: {scenario.text}"
            },
        ]
    )

    return completion.choices[0].message.content

Used in production flow:

async def caller():
    summary = await get_summary(model, scenario)
    print(summary)

Used in training flow:

trajectory = await gather_trajectory(get_summary(model, scenario))

@giladfrid009 any feedback? Noticed you downvoted the proposal.

@giladfrid009 any feedback? Noticed you downvoted the proposal.

Hey,

For me personally even if it would work, it feels "too magical", and in some cases I believe verbosity is actually good. Since proper trajectory construction is the core of  basically everything (proper RL training or even proper conversation with the model), explicit construction in my opinion  is actually warranted. 

As far as I can tell, the biggest strength of ART right now is the support of multi-step rollouts with RL. And in that case, I think using this decorator only complicates the workflow:

  • First, if multiple histories are part of the multi-step rollout im not sure how this decorator would determine the correct trajectory.

  • Second, in multi-step rollouts while using this decorator, the programmer would have to again use another "magic" method get_current_trajectory() in order to insert either tool responses or other messages to the trajectory.

  • Very important aspect in my view is the ability to debug and validate proper construction of trajectories while using this functionality.

  • Lastly, I personally think that actually returning the trajectory from the rollout function is fine, as it allows to seperate the rollout logic and reward calculation logic, by processing the returned trajectory.

To conclude, I would hesitate to use this functionality since the control of trajectory construction and structure is no longer visible, and as result - not clear.

(maybe not related, but note that  completion.choices[0].message.content
might actually be None in some cases, even if no tools were used [sometimes Qwen just outputs EOT immediatly, encountered it several times; which results in empty content] while trajectory.messages()[-1]["content"] is patched to always be a string, see comit 696c230 . This again requires an explicit call to get_current_trajectory(). )

First, if multiple histories are part of the multi-step rollout im not sure how this decorator would determine the correct trajectory.

Haven't built it out yet, but I think we can do this robustly with history prefix matching.

Second, in multi-step rollouts while using this decorator, the programmer would have to again use another "magic" method get_current_trajectory() in order to insert either tool responses or other messages to the trajectory.

The programmer could do that, or maintain their code in its present state without the decorator.

Very important aspect in my view is the ability to debug and validate proper construction of trajectories while using this functionality.

I think the decorator shouldn't affect this. You should be able to print out (or debug on a dashboard) any trajectory your rollout function constructs. If there ends up being a bug in @art.rollout, that's obviously bad, but the idea is to ensure we don't have any.

To conclude, I would hesitate to use this functionality since the control of trajectory construction and structure is no longer visible, and as result - not clear.

Makes sense, in some cases it will probably be easier to construct a Trajectory and return it manually. The plan is to continue supporting the current flow for developers who prefer it! @art.rollout will just be an option.

As referenced in the initial proposal, the goal here is to reduce the overhead of integrating ART into a production codebase, where it's often inconvenient to extract a processed type from a returned trajectory. For example, calling code should never have to do this:

async def caller():
  movie_script = "EXT. SPACE\nA vast sea of stars severas as a backdrop..."
  	
  trajectory = await rollout(model=model, scenario=MovieScenario(script=movie_script))
  
  last_message = trajectory.messages()[-1]
  
  tool_call = message.tool_calls[0]
  
  try:
    tool_args = json.loads(tool_call.function.arguments)
    assert isinstance(tool_args, dict)
  except Exception:
    # ...retry logic...
    
  is_good = tool_args["class"] == "good"
		
  print(is_good)
  # False

In a training-only environment, where we don't actually care about the generated output of a rollout function beyond the signal it gives of the model's performance on our task, we don't need to deal with the problem of actually using the model's output. But in an actual production codebase, it's a lot more convenient to have our LLM-enabled function return the processed type, so that we can do this simpler flow instead.

async def caller():
	movie_script = "EXT. SPACE\nA vast sea of stars severas as a backdrop..."
		
	is_good = await rollout(model=model, scenario=MovieScenario(script=movie_script))
	
	print(is_good)
	# False

In short, this decorator is intended to help clean up production code bases that use a single rollout function for both training and the deployed project.

@giladfrid009 btw really appreciate the feedback, it makes the project much better for everyone!

I'm starting to explore the API shape & feasibility in #351

I've added experimental auto trajectory capture support. Works like this:

trajectory = await art.capture_auto_trajectory(do_something())

Tests are passing for straightforward usage of openai and litellm libs. Since we patch httpx , direct usage of that library should also work. Where possible we merge history with successive API calls, otherwise we create new histories.

Open questions include whether to support other high-level or low-level libraries, such as requests, and how to disambiguate different model calls (e.g. calls to a different model as a judge) that we don't want to include in an automatically captured trajectory.

To interact with the auto trajectory in your function (to assign a reward, add a tool call result, etc.) you can call
art.auto_trajectory():

if trajectory := art.auto_trajectory():
    trajectory.reward = 1.0

I've added this to main so we can start experimenting with/changing it, but probably wouldn't suggest advertising the feature until we've hammered it out more.