pb33f/libopenapi

Validation and Bundling problems

emetsger opened this issue ยท 9 comments

Hi Team,

First, thanks for this library. ๐Ÿ™ It seems pretty mature, and one of the few supporting OpenAPI 3.1. ๐Ÿ™Œ

I have run across some problems performing validation and bundling. I'm happy to provide formal bug reports but am not quite sure where to start. I thought I'd introduce what I've observed using this library, get some feedback on the triage process, then take next steps (e.g. open individual bug reports, contribute test cases, suss out PRs for fixing the issues, or whatever).

Unfortunately I can't provide a copy of my OpenAPI doc due to NDA.

My use case is that I have a modest, monolithic OpenAPI 3.1 doc (~67KiB) serialized as JSON (not YAML) that I want to decompose into individual files in the filesystem, using $ref to link the parts to the whole. Then I want to bundle the decomposed doc for distribution.

  • โœ… I've successfully decomposed the schema into individual files

Environment

go version go1.21.4 darwin/arm64
github.com/pb33f/libopenapi v0.15.14

First Issue: successful validation of syntactically invalid JSON

libopenapi will successfully validate the decomposed schema, even when some of the files contain invalid JSON.

I use the following document configuration when creating the doc being validated:

&datamodel.DocumentConfiguration{
		AllowFileReferences: true,
	}

Example: I have a JSON document that ends with an unmatched curly brace }:

{
  "description": "Product",
  "properties": {
    "data": {
      "allOf": [
        {
          "properties": {
            "attributes": {
              "$ref": "base_attributes.json"
            }
          }
        },
        {
          "$ref": "../jsonapi/resource_post.json"
        }
      ]
    }
  }
}
} <---- offending brace

libopenapi successfully validates this file.

If I munge one of the refs, e.g. "$ref": "base_attributes.json" => "$ref": "asdf.json", where asdf.json does not exist on the filesystem, then re-run validation, I get an error:

cannot resolve reference `adsf_attributes.json`

So I know libopenapi validation is reading the file and resolving references, but I don't understand why it doesn't raise an error when the JSON is syntactically invalid.

Redocly catches this syntax error:

npx @redocly/cli lint \
		--skip-rule=no-unused-components \
		--skip-rule=info-license \
		--skip-rule=security-defined \
		--skip-rule=no-server-example.com \
		--skip-rule=operation-summary \
		openapi/openapi.json

...

Failed to parse: end of the stream or a document separator is expected in "/Users/<redacted>/ws/<redacted>/openapi/components/schemas/products/create.json" (20:1)

18 |   }
19 | }
20 | }

Second Issue: Inlining refs when bundling

In-lining refs seems to have some issues (or I misunderstand the feature).

EDIT: this issue is caused by a circular ref.

I'm using the following Go snippet to bundle:

bytes, err := bundler.BundleBytes(inputBytes, &datamodel.DocumentConfiguration{
		AllowFileReferences:     true,
		ExtractRefsSequentially: true,
		BundleInlineRefs:        true,
	})

Compare a snippet of original JSON to the bundled YAML below, and note that neither schema is inlined.

Original JSON:

"responses": {
      "200": {
        "description": "A successful response.",
        "content": {
          "application/vnd.api+json": {
            "schema": {
              "$ref": "../../components/schemas/applications/list.json"
            }
          }
        }
      },
      "404": {
        "$ref": "../../components/responses/notfound.json"
      }
    }

Bundled YAML:

            responses:
                "200":
                    description: "A successful response."
                    content:
                        "application/vnd.api+json":
                            schema: {"$ref": "../../components/schemas/applications/list.json"}
                "404": {"$ref": "../../components/responses/notfound.json"}

Running redocly on the generated yaml fails because the references in the bundled file cannot be resolved (the bundle is generated in a separate directory from the original document).

Doing a bit more digging, I think the root cause of Issue #2 is a bug in the Rolodex.

  1. When I run my bundling tool, the Rolodex emits an error: unable to locate reference anywhere in the rolodex","reference":"paths/applications/applications.json
  2. The un-resolved refs in Issue 2 are limited to paths/applications/applications.json. Refs in other files are inlined as expected.
  3. Redocly lints my decomposed OpenAPI JSON successfully.
  4. The reference is valid
    1. I can copy and paste the file reference and ls the file from the command line (from the same directory as the OpenAPI JSON)
    2. Goland can navigate the reference (no squiggly red underline, command-b navigates to the file)

Interestingly, if I change the order of the paths, and re-run my bundling utility, the error "sticks" to the first path.

For example, ๐Ÿ‘‡ results in ... "reference":"paths/applications/applications.json

  "paths": {
    "/applications": {
      "$ref": "paths/applications/applications.json"
    },
    "/applications/{id}": {
      "$ref": "paths/applications/applications_by_id.json"
    },

while ๐Ÿ‘‡ results in ... "reference":"paths/applications/applications_by_id.json"

  "paths": {
    "/applications/{id}": {
      "$ref": "paths/applications/applications_by_id.json"
    },
    "/applications": {
      "$ref": "paths/applications/applications.json"
    },

Stack at breakpoint line 217 search_index.go (the point where unable to locate reference anywhere in the rolodex","reference":"paths/applications/applications.json is emitted):

index.(*SpecIndex).SearchIndexForReferenceByReferenceWithContext (search_index.go:217) github.com/pb33f/libopenapi/index
index.(*SpecIndex).SearchIndexForReferenceByReference (search_index.go:22) github.com/pb33f/libopenapi/index
index.(*SpecIndex).SearchIndexForReference (search_index.go:30) github.com/pb33f/libopenapi/index
index.seekRefEnd (utility_methods.go:382) github.com/pb33f/libopenapi/index
index.(*SpecIndex).GetOperationsParameterCount (spec_index.go:1090) github.com/pb33f/libopenapi/index
<autogenerated>:2
index.runIndexFunction.func1 (utility_methods.go:516) github.com/pb33f/libopenapi/index
index.runIndexFunction.func2 (utility_methods.go:518) github.com/pb33f/libopenapi/index
runtime.goexit (asm_arm64.s:1197) runtime
 - Async Stack Trace
index.runIndexFunction (utility_methods.go:515) github.com/pb33f/libopenapi/index

Goroutine dump:

Goroutine 1 sync.runtime_Semaquire:
runtime.gopark(proc.go:399)
runtime.goparkunlock(proc.go:404)
runtime.semacquire1(sema.go:160)
sync.runtime_Semacquire(sema.go:62)
sync.(*WaitGroup).Wait(waitgroup.go:116)
github.com/pb33f/libopenapi/index.(*SpecIndex).BuildIndex(spec_index.go:158)
github.com/pb33f/libopenapi/index.(*LocalFS).Open(rolodex_file_loader.go:131)
github.com/pb33f/libopenapi/index.(*Rolodex).Open(rolodex.go:476)
github.com/pb33f/libopenapi/index.(*SpecIndex).lookupRolodex(find_component.go:125)
github.com/pb33f/libopenapi/index.(*SpecIndex).FindComponent(find_component.go:31)
github.com/pb33f/libopenapi/index.(*SpecIndex).ExtractComponentsFromRefs.func1(extract_refs.go:632)
github.com/pb33f/libopenapi/index.(*SpecIndex).ExtractComponentsFromRefs(extract_refs.go:679)
github.com/pb33f/libopenapi/index.createNewIndex(spec_index.go:100)
github.com/pb33f/libopenapi/index.NewSpecIndexWithConfig(spec_index.go:50)
github.com/pb33f/libopenapi/index.(*LocalFile).Index(rolodex_file_loader.go:193)
github.com/pb33f/libopenapi/index.(*LocalFS).Open(rolodex_file_loader.go:118)
github.com/pb33f/libopenapi/index.(*Rolodex).Open(rolodex.go:476)
github.com/pb33f/libopenapi/index.(*SpecIndex).lookupRolodex(find_component.go:125)
github.com/pb33f/libopenapi/index.(*SpecIndex).FindComponent(find_component.go:41)
github.com/pb33f/libopenapi/index.(*SpecIndex).ExtractComponentsFromRefs.func1(extract_refs.go:632)
github.com/pb33f/libopenapi/index.(*SpecIndex).ExtractComponentsFromRefs(extract_refs.go:679)
github.com/pb33f/libopenapi/index.createNewIndex(spec_index.go:100)
github.com/pb33f/libopenapi/index.NewSpecIndexWithConfig(spec_index.go:50)
github.com/pb33f/libopenapi/index.(*Rolodex).IndexTheRolodex(rolodex.go:334)
github.com/pb33f/libopenapi/datamodel/low/v3.createDocument(create_document.go:104)
github.com/pb33f/libopenapi/datamodel/low/v3.CreateDocumentFromConfig(create_document.go:28)
github.com/pb33f/libopenapi.(*document).BuildV3Model(document.go:316)
github.com/pb33f/libopenapi/bundler.BundleBytes(bundler.go:34)
main.main(openapi-bundle.go:53)
runtime.main(proc.go:267)
runtime.goexit(asm_arm64.s:1197)
runtime.newproc(<autogenerated>:1)
github.com/pb33f/libopenapi/index.(*SpecIndex).MapNodes:
runtime.gopark(proc.go:399)
runtime.chansend(chan.go:259)
runtime.chansend1(chan.go:145)
github.com/pb33f/libopenapi/index.(*SpecIndex).MapNodes(map_index_nodes.go:69)
github.com/pb33f/libopenapi/index.createNewIndex.func1(spec_index.go:74)
runtime.goexit(asm_arm64.s:1197)
github.com/pb33f/libopenapi/index.createNewIndex(spec_index.go:74)
Goroutine 1105 sync.(*Map).Swap:
runtime.(*spanSet).pop(mspanset.go:189)
runtime.(*mcentral).cacheSpan(mcentral.go:111)
runtime.(*mcache).refill(mcache.go:182)
runtime.(*mcache).nextFree(malloc.go:925)
runtime.mallocgc(malloc.go:1112)
runtime.newarray(malloc.go:1346)
runtime.makeBucketArray(map.go:364)
runtime.hashGrow(map.go:1068)
runtime.mapassign(map.go:659)
sync.(*Map).Swap(map.go:365)
sync.(*Map).Store(map.go:155)
github.com/pb33f/libopenapi/index.(*SpecIndex).extractDefinitionsAndSchemas(utility_methods.go:39)
github.com/pb33f/libopenapi/index.(*SpecIndex).GetComponentSchemaCount(spec_index.go:850)
github.com/pb33f/libopenapi/index.(*SpecIndex).GetComponentSchemaCount-fm(<autogenerated>:1)
github.com/pb33f/libopenapi/index.runIndexFunction.func1(utility_methods.go:516)
github.com/pb33f/libopenapi/index.runIndexFunction.func2(utility_methods.go:518)
runtime.goexit(asm_arm64.s:1197)
github.com/pb33f/libopenapi/index.runIndexFunction(utility_methods.go:515)
Goroutine 1108 github.com/pb33f/libopenai/index.(*SpecIndesx).SearchIndexForReferenceByReferenceWithContext
github.com/pb33f/libopenapi/index.(*SpecIndex).SearchIndexForReferenceByReferenceWithContext(search_index.go:217)
github.com/pb33f/libopenapi/index.(*SpecIndex).SearchIndexForReferenceByReference(search_index.go:22)
github.com/pb33f/libopenapi/index.(*SpecIndex).SearchIndexForReference(search_index.go:30)
github.com/pb33f/libopenapi/index.seekRefEnd(utility_methods.go:382)
github.com/pb33f/libopenapi/index.(*SpecIndex).GetOperationsParameterCount(spec_index.go:1090)
github.com/pb33f/libopenapi/index.(*SpecIndex).GetOperationsParameterCount-fm(<autogenerated>:1)
github.com/pb33f/libopenapi/index.runIndexFunction.func1(utility_methods.go:516)
github.com/pb33f/libopenapi/index.runIndexFunction.func2(utility_methods.go:518)
runtime.goexit(asm_arm64.s:1197)
github.com/pb33f/libopenapi/index.runIndexFunction(utility_methods.go:515)

All other routines were in runtime.gopark

The error seems to be emitted by line 118 of rolodex_file_loader.go (extractedFile.Index(&copiedCfg)), for the very first $ref in the OpenAPI document (in this example, paths/applications/applications_by_id.json):

image

idx is nil for the first (outermost thing to be indexed?) $ref on line 168 (search_index.go):
image

Issue 2 is caused by a circular ref. Here's a pathological document that illustrates. I should be able to work around it by restructuring my doc and eliminating the problematic circular refs.

diff --git a/test_specs/nested_files/openapi.yaml b/test_specs/nested_files/openapi.yaml
index 6c14e1e..8eb0f8b 100644
--- a/test_specs/nested_files/openapi.yaml
+++ b/test_specs/nested_files/openapi.yaml
@@ -19,3 +19,12 @@ servers:
 paths:
   /api/v1/Accounts:
     $ref: "paths/v1_Accounts.yaml"
+components:
+  parameters:
+    id:
+      name: id
+      in: path
+      description: id
+      required: true
+      schema:
+        type: string
diff --git a/test_specs/nested_files/paths/v1_Accounts.yaml b/test_specs/nested_files/paths/v1_Accounts.yaml
index dc5c396..66a79a9 100644
--- a/test_specs/nested_files/paths/v1_Accounts.yaml
+++ b/test_specs/nested_files/paths/v1_Accounts.yaml
@@ -9,6 +9,7 @@ get:
   parameters:
     - $ref: ../components/parameters/query/$select.yaml
     - $ref: ../components/parameters/header/page-size.yaml
+    - $ref: ../openapi.yaml#/components/parameters/id
   responses:
     "200":
       $ref: ../components/responses/Unspecified200.yaml

Bundle the modified test_specs/nested_files/openapi.yaml using the util pasted below, and bundling fails (output below):

openapi: 3.0.0
info:
    description: Example API spec
    version: v1
    title: Example
    contact:
        name: Example
        email: example@example.com
        url: www.example.com
    license:
        name: Example
        url: www.example.com
tags:
    - name: Account
      description: Account
servers:
    - url: https://<hidden>
paths:
    /api/v1/Accounts:
        get:
            summary: TODO
            description: TODO
            security:
                - BearerAuth: []
            tags:
                - Account
            operationId: getAccounts
            parameters:
                - $ref: ../components/parameters/query/$select.yaml
                - $ref: ../components/parameters/header/page-size.yaml
                - $ref: ../openapi.yaml#/components/parameters/id
            responses:
                "200":
                    $ref: ../components/responses/Unspecified200.yaml
        post:
            summary: TODO
            description: TODO
            security:
                - BearerAuth: []
            tags:
                - Account
            operationId: createAccounts
            requestBody:
                $ref: ../components/requestBodies/AccountModel.yaml
            responses:
                "200":
                    $ref: ../components/responses/Unspecified200.yaml
        put:
            summary: TODO
            description: TODO
            security:
                - BearerAuth: []
            tags:
                - Account
            operationId: updateAccounts
            requestBody:
                $ref: ../components/requestBodies/AccountModel.yaml
            responses:
                "200":
                    $ref: ../components/responses/Unspecified200.yaml
components:
    parameters:
        id:
            name: id
            in: path
            description: id
            required: true
            schema:
                type: string

Bundle util:

package main

import (
	"flag"
	"fmt"
	"github.com/pb33f/libopenapi/bundler"
	"github.com/pb33f/libopenapi/datamodel"
	"log/slog"
	"os"
)

func main() {
	inputFile := flag.String("input", "openapi.yaml", "relative or absolute path to the OpenAPI document to be bundled")
	outputFile := flag.String("output", "openapi-bundled.yaml", "relative or absolute path to the bundled OpenAPI document")
	verbose := flag.Bool("verbose", false, "enable log output")
	//logLevel := flag.Int("logLevel", 0, "valid options are -4 to 8.  smaller numbers are more verbose")

	flag.Parse()

	inputBytes, err := os.ReadFile(*inputFile)
	if err != nil {
		fmt.Printf("error reading input file '%s': %s\n", *inputFile, err.Error())
		os.Exit(1)
	}

	docConfig := &datamodel.DocumentConfiguration{
		AllowFileReferences: true,
	}

	if *verbose {
		docConfig.Logger = slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{
			Level: slog.LevelDebug,
		}))
	}

	bundleConf := &datamodel.DocumentConfiguration{
		AllowFileReferences:     true,
		ExtractRefsSequentially: true,
		BundleInlineRefs:        true,
	}

	if *verbose {
		bundleConf.Logger = slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{
			Level: slog.LevelDebug,
		}))
	}

	bytes, err := bundler.BundleBytes(inputBytes, bundleConf)

	if err != nil {
		fmt.Printf("error bundling OpenAPI doc %s: %s\n", *inputFile, err.Error())
	}

	err = os.WriteFile(*outputFile, bytes, 0644)
	if err != nil {
		fmt.Printf("error writing bundled OpenAPI document to %s: %s\n", err.Error())
	}

	fmt.Fprintf(os.Stderr, "successfully bundled %s to %s\n", *inputFile, *outputFile)
}

Hi Team,

First, thanks for this library. ๐Ÿ™ It seems pretty mature, and one of the few supporting OpenAPI 3.1. ๐Ÿ™Œ

Hello, Glad you appreciate it!

&datamodel.DocumentConfiguration{
		AllowFileReferences: true,
	}

If you want to use exploded files, you need to set the BasePath so libopenapi knows where to look.

https://pb33f.io/libopenapi/rolodex/
https://pb33f.io/libopenapi/the-index/#setting-a-base-path

Example: I have a JSON document that ends with an unmatched curly brace }:

The syntax of the document is correct, the extra brace is something the library just copes with, it's not something we're going to be able to handle it as it's parsed by https://github.com/go-yaml/yaml (this is the absolute core of libopenapi).

If you create un-parseable JSON or YAML, the lib will complain, otherwise, if it can be parsed and there is junk before or after it, the library will just crank on with it.

So I know libopenapi validation is reading the file and resolving references, but I don't understand why it doesn't raise an error when the JSON is syntactically invalid.

See above.

Redocly catches this syntax error:

Looks like they need a hardier parser :)

Second Issue: Inlining refs when bundling

In-lining refs seems to have some issues (or I misunderstand the feature).

EDIT: this issue is caused by a circular ref.

Yep, circular references can never be inlined, in fact you can't do much with them, which is why there is a large emphasis on making sure you're aware of what you're doing.

For more control over circular refs, please see here;

https://pb33f.io/libopenapi/circular-references/
https://pb33f.io/libopenapi/resolver/#detecting-circular-references

Doing a bit more digging, I think the root cause of Issue #2 is a bug in the Rolodex.

Happy to debug it, but I really really need a reproducible example. I have hundreds and hundreds of specs that I see every month and I simply don't have time to debug an issue, without a spec sample I can throw into my testrig and find the root cause. The logic in the rolodex, index and resolver is highly complex, due to the recursive async nature of the code, I need something to look at that causes the exact issue you're seeing.

pb33f is just me, I am a one man shop.

Doing a bit more digging, I think the root cause of Issue #2 is a bug in the Rolodex.

โ˜๏ธ that comment was premature, prior to discovering/understanding that that the real root cause was a circular reference.

Happy to debug it, but I really really need a reproducible example.

Totally understand, no worries. The diff in the above comment introduces a circular reference into one of your test cases, and replicated my problem.

But as you note, circular references aren't allowed, so the test case isn't "valid", in the sense that libopenapi is working as intended.

pb33f is just me, I am a one man shop.

Impressive :) Thanks again for all the hard work on writing and maintaining this library!

Redocly catches this syntax error:

Looks like they need a hardier parser :)

Fair enough :)

Though I will say that most tooling around openAPI is not that hardy, so having well-formed JSON is considered "ground state" for most toolchains. libopenapi seems to be the exception (not complaining!)

Closing this as not a bug/issue, libopenapi is working as intended.