openai/openai-node

Constrained decoding with Extended Backus-Naur Form (EBNF)

Closed this issue · 1 comments

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

  • This is a feature request for the Node library

Describe the feature or improvement you're requesting

Similar to the current zodResponseFormat, but instead of using Zod schemas, developers would define output structures using EBNF.

Does OpenAI use a CFG internally?
https://openai.com/index/introducing-structured-outputs-in-the-api/

To do this, we convert the supplied JSON Schema into a context-free grammar (CFG).

Implementing this feature would enable more constrained formats such as JSON, SVG, HTML, Git diff patches, PostScript, and CSV.

Here's an example using the OpenAI npm library with openai.beta.chat.completions.parse() and a new ebnfResponseFormat.

JSON EBNF

const jsonEbnf = `
json ::= object | array
object ::= '{' pair (',' pair)* '}'
pair ::= string ':' value
array ::= '[' value (',' value)* ']'
value ::= string | number | object | array | 'true' | 'false' | 'null'
string ::= '"' [a-zA-Z0-9_]+ '"'
number ::= [0-9]+
`;

const completion = await openai.beta.chat.completions.parse({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "system",
      content: "Generate a valid JSON object with name and age.",
    },
    { role: "user", content: "Create an example." },
  ],
  response_format: ebnfResponseFormat(jsonEbnf),
});

const result = completion.choices[0].message.parsed;

JSON EBNF with Specific Schema

const specificJsonEbnf = `
json ::= object
object ::= '{' 'name:' string ',' 'age:' number '}'
string ::= '"' [a-zA-Z0-9_ ]+ '"'
number ::= [0-9]+
`;

const specificCompletion = await openai.beta.chat.completions.parse({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "system",
      content:
        "Generate a JSON object with the specific schema {name:string, age:number}.",
    },
    { role: "user", content: "Create an example." },
  ],
  response_format: ebnfResponseFormat(specificJsonEbnf),
});

const specificResult = specificCompletion.choices[0].message.parsed;

written with gpt-4o

Additional context

SVG EBNF

const svgEbnf = `
svg ::= '<svg' attribute* '>' content '</svg>'
attribute ::= [a-zA-Z]+ '="' [a-zA-Z0-9]+ '"'
content ::= '<circle' attribute* '/>' | '<rect' attribute* '/>'
`;

HTML EBNF

const htmlEbnf = `
html ::= '<html>' content '</html>'
content ::= '<head>' headContent '</head>' '<body>' bodyContent '</body>'
headContent ::= '<title>' string '</title>'
bodyContent ::= element*
element ::= '<div>' content '</div>' | '<p>' string '</p>'
string ::= '"' [a-zA-Z0-9_ ]+ '"'
`;

Git Diff EBNF

const gitDiffEbnf = `
diff ::= 'diff --git ' file file '\n' chunk+
file ::= 'a/' [a-zA-Z0-9./]+ | 'b/' [a-zA-Z0-9./]+
chunk ::= '@@' lineInfo lineInfo '@@\n' changes
lineInfo ::= '-' [0-9]+ ',' [0-9]+
changes ::= (addition | deletion | context)*
addition ::= '+' [a-zA-Z0-9_ ]+ '\n'
deletion ::= '-' [a-zA-Z0-9_ ]+ '\n'
context ::= ' ' [a-zA-Z0-9_ ]+ '\n'
`;

PostScript EBNF

const postscriptEbnf = `
postscript ::= '%!' commands
commands ::= command*
command ::= operator operand*
operator ::= '/' [a-zA-Z]+
operand ::= number | string | array
array ::= '[' operand* ']'
number ::= [0-9]+('.'[0-9]+)?
string ::= '(' [a-zA-Z0-9 ]+ ')'
`;

Thanks for reporting!

This sounds like a feature request for the underlying OpenAI API and not the SDK, so I'm going to go ahead and close this issue.

Would you mind reposting at community.openai.com?