End event contains wrong data when streaming structured output

Question

End event contains wrong data when streaming structured output

Opened this issue 2 months ago · 4 comments

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain.js documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain.js rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  modelName: "gpt-4o",
  streaming: true,
  streamUsage: true,
})
  .withStructuredOutput(
    {
      title: "Joke",
      description: "Joke to tell user.",
      type: "object",
      properties: {
        setup: {
          type: "string",
          description: "The setup for the joke",
        },
        punchline: {
          type: "string",
          description: "The joke's punchline",
        },
      },
      required: ["setup", "punchline"],
      additionalProperties: false,
    },
    {
      strict: true,
      method: "jsonSchema",
    },
  )
  .withConfig({ runName: "joke" });

const eventStream = model.streamEvents(
  "Tell me a joke about cats",
  { version: "v2" },
  { includeNames: ["joke"] },
);
for await (const event of eventStream) {
  console.log(event);
}

Error Message and Stack Trace (if applicable)

{
  "event": "on_chain_end",
  "data": {
    "output": {
      "setup": "WhyWhy wasWhy was theWhy was the catWhy was the cat sittingWhy was the cat sitting onWhy was the cat sitting on theWhy was the cat sitting on the computerWhy was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?Why was the cat sitting on the computer?",
      "punchline": "BecauseBecause itBecause it wantedBecause it wanted toBecause it wanted to keepBecause it wanted to keep anBecause it wanted to keep an eyeBecause it wanted to keep an eye onBecause it wanted to keep an eye on theBecause it wanted to keep an eye on the mouseBecause it wanted to keep an eye on the mouse!"
    }
  },
  "run_id": "6b7b38ba-d191-4278-a407-ce2287683205",
  "name": "joke",
  "tags": [],
  "metadata": {}
}

Description

The end event is a concatenation of all the intermediary messages, and not just the final result. If, for example, these are the streamed tokens: Why was the, the end event contains WhyWhy wasWhy was the instead of Why was the.

System Info

node: v20.18.0
@langchain/community: 0.3.11
@langchain/core: 0.3.16
@langchain/openai: 0.3.11

Answer 1 · 2024-10-30T12:52:37.000Z

The issue you're experiencing with the on_chain_end event containing concatenated intermediary tokens instead of the final assembled output seems to be related to how the streaming and event handling are implemented in LangChain.js. The on_chain_end event is supposed to capture only the final output after all intermediary processing steps are completed ^[1]^[2].

To address this, you might want to ensure that the event handling logic correctly accumulates and processes the streamed tokens before emitting the on_chain_end event. This involves using a mechanism to buffer the tokens and assemble them into the final output before triggering the end event. The current behavior suggests that the tokens are being concatenated without proper handling of the final assembly.

If this is indeed a bug in the library, it might require a fix in the event handling logic to ensure that only the final, correctly assembled output is included in the on_chain_end event.

To continue talking to Dosu, mention @dosu.

Answer 2 · 2024-11-22T21:25:02.000Z

Hello, we are a group of students at the University of Toronto looking into fixing this issue.

Answer 3 · 2024-11-23T12:01:46.000Z

Amazing, thanks!

Answer 4 · 2024-11-29T21:44:53.000Z

I have identified this to be an issue with the _streamIterator method of the RunnableSequence class within /src/runnables/base.ts. It looks like concatSupported is hardcoded to true and only the true case is implemented. I am almost finished adding support for concatSupported = false, which the ChatOpenAI model should be ran with.