alexrudall/ruby-openai

Native SDK support for Structured Outputs

Opened this issue Β· 15 comments

The Problem

We all hate trying to coax ChatGPT into adhering to a JSON schema. OpenAI decided to make that easier for us.

The flow:

  • Declare the data transfer objects (DTOs) that you want from the model.
  • Demand the response exactly in your format.
  • Parse the response with code instead of godawful regex/string extractor

It would be really nice to have native support within this ruby gem!

Prior Art

In python, a nice example with math Q&A

from pydantic import BaseModel

from openai import OpenAI


class Step(BaseModel):
    explanation: str
    output: str


class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str


client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "solve 8x + 31 = 2"},
    ],
    response_format=MathResponse,
)

message = completion.choices[0].message
if message.parsed:
    print(message.parsed.steps)
    print(message.parsed.final_answer)
else:
    print(message.refusal)

Potential Solve

Not sure what the ideal pydantic replacement would be, but perhaps dry-struct? DTO declarations could look like this:

require 'dry-struct'
require 'dry-types'

module Types
  include Dry.Types()
end

class Step < Dry::Struct
  attribute :explanation, Types::String
  attribute :output, Types::String
end

class MathResponse < Dry::Struct
  attribute :steps, Types::Array.of(Step)
  attribute :final_answer, Types::String
end

(Not sure if the gem should handle any schema validations, since that's purportedly OpenAI's job, but there's dry-validations if so.)

The rest of OpenAI's math tutor example might look like

client = OpenAI::Client.new

completion = client.chat(
  parameters: {
    model: "gpt-4o-2024-08-06",
    messages: [
      { role: "system", content: "You are a helpful math tutor."},
      { role: "user", content: "solve 8x + 31 = 2"},
    ],
    response_format: MathResponse
  }
)

message = completion.dig("choices", 0, "message")
if message.parsed
  message.parsed.steps.each do |step|
    puts step.explanation
    puts step.output
  end
  puts message.parsed.final_answer
else
  puts message.refusal
end
bastos commented

I like dry-rb and use it daily, but I think it should be bare-bones and let the developer create abstractions on top of it using dry-rb, Sorbet, Active Record etc. At least make passing and returning dry-rb objects optional.

Example:

client = OpenAI::Client.new

completion = client.chat(
  parameters: {
    model: "gpt-4o-2024-08-06",
    messages: [
      { role: "system", content: "You are a helpful math tutor." },
      { role: "user", content: "solve 8x + 31 = 2" },
    ],
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "math_response",
        strict: true,
        schema: {
          type: "object",
          properties: {
            steps: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  explanation: {
                    type: "string",
                  },
                  output: {
                    type: "string",
                  },
                },
                required: ["explanation", "output"],
                additionalProperties: false,
              },
            },
            final_answer: {
              type: "string",
            },
          },
          required: ["steps", "final_answer"],
          additionalProperties: false,
        },
      },
    },
  }
)

And if response_format is passed, the library should use JSON.parse on the output.

The grok-ruby gem has an optional dry-schema integration: https://github.com/drnic/groq-ruby#using-dry-schema-with-json-mode

bastos commented

The grok-ruby gem has an optional dry-schema integration: https://github.com/drnic/groq-ruby#using-dry-schema-with-json-mode

And the Node official library has support for Zod. So, making an argument against my previous comment, maybe it should support dry's Dry::Schema.JSON (optionally, I hope).

Here's a ruby script implementing StructuredOutputs::Schema that provides the functionality OpenAI added to their python SDK for super-simple structured output definitions. Replicates their cookbook example perfectly.

Key thing is this simplicity:

class MathReasoning < StructuredOutputs::Schema
  def initialize
    super do
      define :step do
        string :explanation
        string :output
      end
      array :steps, items: ref(:step)
      string :final_answer
    end
  end
end

schema = MathReasoning.new
  
result = client.parse(
  model: "gpt-4o-2024-08-06",
  messages: [
    { role: "system", content: "You are a helpful math tutor. Guide the user through the solution step by step." },
    { role: "user", content: "how can I solve 8x + 7 = -23" }
  ],
  response_format: schema
)

To get this:

{
  "steps": [
    {
      "expression": "8x + 7 = -23",
      "explanation": "This is your starting equation."
    },
    {
      "expression": "8x = -23 - 7",
      "explanation": "Subtract 7 from both sides to isolate the term with 'x' on the left side."
    },
    {
      "expression": "8x = -30",
      "explanation": "Simplify the right side by calculating -23 - 7, which is -30."
    },
    {
      "expression": "x = -30 / 8",
      "explanation": "Divide both sides by 8 to solve for 'x'."
    },
    {
      "expression": "x = -3.75",
      "explanation": "Simplify the division to get the final value of 'x'. Alternatively, divide both sides of the equation 30 by 2 to simplify it down to 15, then divide again to get 15 divided by 4, which is -3.75."
    }
  ],
  "final_answer": "x = -3.75"
}

The response_format is very necessary. This script is excellent, and I hope it gets implemented in the project because it’s extremely useful!
@alexrudall Do you think it could be included?

Strong endorse with @bastos approach

Hoping this gets included soon!

adenta commented

I support this. is there a bounty?

adenta commented

I adapted @jeremedia's script to work with rails. I renamed Schema to BaseSchema as not to conflict with the existing schema.rb.

class BaseSchema
  MAX_OBJECT_PROPERTIES = 100
  MAX_NESTING_DEPTH = 5

  def initialize(name = nil, &block)
    # Use the provided name or derive from class name
    @name = name || self.class.name.split('::').last.downcase
    # Initialize the base schema structure
    @schema = {
      type: 'object',
      properties: {},
      required: [],
      additionalProperties: false,
      strict: true
    }
    @definitions = {}
    # Execute the provided block to define the schema
    instance_eval(&block) if block_given?
    validate_schema
  end

  # Convert the schema to a hash format
  def to_hash
    {
      name: @name,
      description: 'Schema for the structured response',
      schema: @schema.merge({ '$defs' => @definitions })
    }
  end

  private

  # Define a string property
  def string(name, enum: nil, description: nil)
    add_property(name, { type: 'string', enum:, description: }.compact)
  end

  # Define a number property
  def number(name)
    add_property(name, { type: 'number' })
  end

  # Define a boolean property
  def boolean(name)
    add_property(name, { type: 'boolean' })
  end

  # Define an object property
  def object(name, &block)
    properties = {}
    required = []
    BaseSchema.new.tap do |s|
      s.instance_eval(&block)
      properties = s.instance_variable_get(:@schema)[:properties]
      required = s.instance_variable_get(:@schema)[:required]
    end
    add_property(name, { type: 'object', properties:, required:, additionalProperties: false })
  end

  # Define an array property
  def array(name, items:)
    add_property(name, { type: 'array', items: })
  end

  # Define an anyOf property
  def any_of(name, schemas)
    add_property(name, { anyOf: schemas })
  end

  # Define a reusable schema component
  def define(name, &block)
    @definitions[name] = BaseSchema.new(&block).instance_variable_get(:@schema)
  end

  # Reference a defined schema component
  def ref(name)
    { '$ref' => "#/$defs/#{name}" }
  end

  # Add a property to the schema
  def add_property(name, definition)
    @schema[:properties][name] = definition
    @schema[:required] << name
  end

  # Validate the schema against defined limits
  def validate_schema
    properties_count = count_properties(@schema)
    raise 'Exceeded maximum number of object properties' if properties_count > MAX_OBJECT_PROPERTIES

    max_depth = calculate_max_depth(@schema)
    raise 'Exceeded maximum nesting depth' if max_depth > MAX_NESTING_DEPTH
  end

  # Count the total number of properties in the schema
  def count_properties(schema)
    return 0 unless schema.is_a?(Hash) && schema[:properties]

    count = schema[:properties].size
    schema[:properties].each_value do |prop|
      count += count_properties(prop)
    end
    count
  end

  # Calculate the maximum nesting depth of the schema
  def calculate_max_depth(schema, current_depth = 1)
    return current_depth unless schema.is_a?(Hash) && schema[:properties]

    max_child_depth = schema[:properties].values.map do |prop|
      calculate_max_depth(prop, current_depth + 1)
    end.max
    [current_depth, max_child_depth].max
  end
end
require 'json'
require 'dry-schema'
require 'ostruct'

# Client class for interacting with OpenAI API
class OpenAISchemaClient
  def initialize
    OpenAI.configure do |config|
      config.access_token = ENV['OPENPIPE_ACCESS_TOKEN']
      config.uri_base = 'https://app.openpipe.ai/api/v1'
      config.log_errors = true
    end
    @client = OpenAI::Client.new
  end

  # Send a request to OpenAI API and parse the response
  def parse(model:, messages:, response_format:)
    response = @client.chat(
      parameters: {
        model:,
        messages:,
        response_format: {
          type: 'json_schema',
          json_schema: response_format.to_hash
        }
      }
    )

    content = JSON.parse(response['choices'][0]['message']['content'])

    if response['choices'][0]['message']['refusal']
      OpenStruct.new(refusal: response['choices'][0]['message']['refusal'], parsed: nil)
    else
      OpenStruct.new(refusal: nil, parsed: content)
    end
  end
end

# example usage:

# begin
#   # Create an OpenAI client
#   client = OpenAISchemaClient.new
#   # Create an instance of the MathReasoning schema
#   schema = MathReasoning.new

#   # Send a request to OpenAI API
#   result = client.parse(
#     model: 'gpt-4o-2024-08-06',
#     messages: [
#       { role: 'system', content: 'You are a helpful math tutor. Guide the user through the solution step by step.' },
#       { role: 'user', content: 'how can I solve 8x + 7 = -23' }
#     ],
#     response_format: schema
#   )

#   # Handle the response
#   if result.refusal
#     puts "The model refused to respond: #{result.refusal}"

#   else
#     puts JSON.pretty_generate(result.parsed)

#   end
# rescue StandardError => e
#   puts "Error: #{e}"
# end

class MathReasoning < BaseSchema
  def initialize
    super do
      define :step do
        string :explanation
        string :output
      end
      array :steps, items: ref(:step)
      string :final_answer
    end
  end
end

Any updates on this? Will be a great feature πŸš€ !

Based on the https://gist.github.com/jeremedia/7e874bc6283a10ce8b4d2746413d3ce4#file-ruby-structured-outputs-v4-rb

I got this working. The API call is a bit different but my example below:

require 'openai'
require_relative 'structured_outputs'

class WorkbankSchema < StructuredOutputs::Schema
  def initialize
    super do
      define :word do
        string :japanese
        string :romaji
        string :english
      end
      array :nouns, items: ref(:word)
      array :verbs, items: ref(:word)
      array :adjectives, items: ref(:word)
      array :adverbs, items: ref(:word)
    end
  end
end

class OpenAIService
  def self.client
    @client ||= OpenAI::Client.new(access_token: ENV['OPENAI_API_KEY'])
  end

  def self.generate_vocabulary(domain, word_count_per_category: 5)
    prompt = <<~PROMPT
      Generate a Japanese vocabulary wordbank for the domain: #{domain}
      Format the response as a JSON array of objects with the following structure:
      [
        {
          "word": "Japanese word",
          "category": "noun|verb|adjective|adverb"
        }
      ]
      Include #{word_count_per_category} words for each category (noun, verb, adjective, adverb) that are commonly used in #{domain}.
      Make sure the words are appropriate for the domain and useful for constructing basic sentences.
    PROMPT

    schema = WorkbankSchema.new
    response = client.chat(
      parameters: {
        model: "gpt-4o-2024-08-06",
        messages: [
          { role: "system", content: "You are a Japanese language teacher creating vocabulary lists." },
          { role: "user", content: prompt }
        ],
        response_format: { type: 'json_schema', json_schema: schema},
        temperature: 0.7
      }
    )

    begin
      JSON.parse(response.dig("choices", 0, "message", "content"))
    rescue JSON::ParserError => e
      puts "Error parsing OpenAI response: #{e.message}"
      nil
    rescue StandardError => e
      puts "Error in OpenAI request: #{e.message}"
      nil
    end
  end
end

This would be a really cool feature.

@alexrudall any updates here?

Responses API (with tools):

response = client.responses.create(
  parameters: {
    model: "gpt-4o",
    input: "solve 8x + 31 = 2",
    tools: [{ type: "mcp", server_url: "..." }],
    text: {
      format: {
        type: "json_schema",
        name: "math_response",
        schema: your_schema,
        strict: true
      }
    }
  }
)

The gem passes parameters directly to OpenAI's REST API - structured outputs work out of the box.

https://platform.openai.com/docs/guides/structured-outputs#examples