usnistgov/metaschema

JSON Schema Generation Yields Undesired Allowed Values

aj-stein-nist opened this issue · 16 comments

Describe the bug

Currently, the JSON Schema generator, for metaschema definitions with allowed-values where allow-other="no", where its target does not have a predicate (target=".") or the target is not the default OSCAL namespace (for any target that is not target=".[has-oscal-namespace('http://csrc.nist.gov/ns/oscal')]"), the JSON generator should not emit the allowed values as part of the JSON Schema. If allow-other="yes", then the current enum style of JSON Schema validation, when needed in such cases like it, is not correctly implemented.

It currently gets emitted like this:

              { "type" : "string" } },
            "required" : 
            [ "type" ],
            "additionalProperties" : false,
            "enum" : 
            [ "equivalent-to",
              "equal-to",
              "subset-of",
              "superset-of",
              "intersects-with" ] }

We need it to be adapted to emit JSON Schema like this:

              "type" : 
              { "anyOf":
              [
                { "type": "string"},
                { 
                  "enum" : [
                    "equivalent-to",
                    "equal-to",
                    "subset-of",
                    "superset-of",
                    "intersects-with" 
                  ]
                }
              ] } }

Who is the bug affecting?

OSCAL tool developers want to convert mappings from OSCAL XML to OSCAL JSON and must have the latter validated with JSON Schema after the fact.

What is affected by this bug?

Schema validation with JSON schemas generated from the scheme generator(s) in this repo.

When does this occur?

Consistently.

How do we replicate the issue?

When debugging issues with use of NIST OSCAL automation from community stakeholders in CIS, we were asked to troubleshoot and debug issues validating the emitted JSON file via ajv -c ajv-formats -s oscal_mapping_schema.json -d CCM_CISControlsv8_Mapping-min.json after converting the mapping in OSCAL XML to OSCAL JSON in CI/CD with GitHub Actions. An example run can be found here with current commit aj-stein-nist/CISControls_OSCAL@cd22c23#diff-b7c9b89f382154d95ec71f7a04dd1b62a11804cd8a3ed0c2d908ad97fba7e66e.

The following errors for this seemingly valid OSCAL JSON document instance are below, despite all the correct values being defined for the given fields and assemblies in the document.

/home/runner/work/CISControls_OSCAL/CISControls_OSCAL/git-content/mappings/json/CCM_CISControlsv8_Mapping-min.json invalid
[
  {
    keyword: 'enum',
    dataPath: '/mapping-collection/mappings/maps/0/relationship',
    schemaPath: '#/properties/relationship/enum',
    params: { allowedValues: [Array] },
    message: 'should be equal to one of the allowed values'
  },
  {
    keyword: 'type',
    dataPath: '/mapping-collection/mappings',
    schemaPath: '#/properties/mappings/anyOf/1/type',
    params: { type: 'array' },
    message: 'should be array'
  },
  {
    keyword: 'anyOf',
    dataPath: '/mapping-collection/mappings',
    schemaPath: '#/properties/mappings/anyOf',
    params: {},
    message: 'should match some schema in anyOf'
  }
]
/home/runner/work/CISControls_OSCAL/CISControls_OSCAL/git-content/mappings/json/CCM_CISControlsv8_Mapping-min.json is invalid.

Expected behavior (i.e. solution)

JSON Schema validates properly.

Other Comments

@wendellpiez asked that @aj-stein-nist and others help with testing a potential fix after .5 (approximated 4 hours) of developing a bug to mitigate the fix.

Similar behavior in this GHA job run when validating and converting content and this Gitter chat indicate there is a high likelihood this reflects the same bug. I am looking into this to potentially unblock myself of usnistgov/oscal-content#139.

I likely will have to sync up with Wendell as it turns out some of the <json-value-key> stuff is more difficult than I thought and this isn't going to be a small cosmetic change.

@wendellpiez, the serialized XML form of the example as it should look based on transforming with the basic JSON->XML serializer in XPath 3.0 (as we discussed on the call):

                            <map key="type">
                                <array key="anyOf">
                                    <map>
                                        <string key="type">string</string>
                                    </map>
                                    <map>
                                        <array key="enum">
                                            <string>equivalent-to</string>
                                            <string>equal-to</string>
                                            <string>subset-of</string>
                                            <string>superset-of</string>
                                            <string>intersects-with</string>
                                        </array>
                                    </map>
                                </array>
                            </map>

Noting that even a value key is not required for data types allowing an empty string, the XML representation is underspecified: <relationship type="x"/> is schema valid since the token` data type supports an empty string. (In 2.0 we can make token require a character but that's not backward compatible for 1.x.)

At a minimum we need to relax the requirement for the nominal field value property in the JSON to avoid properties with no values (or empty string values).

Wendell and I paired for this afternoon. We agreed that I would follow up with a minimal Metaschema definition and help stub out tests outside of the more complex definition examples that make it hard to exercise the possible bug, different solutions, and how certain JSON Schema fields should and should not be handled (additionalProperties and required, specifically).

Wendell and I had met a few times and paired and he agreed it is best he work on moving forward the Metaschema "mini" (minimally viable model) using the current version to exposed different allowed values sets and write relevant XSpec tests. I got a little ahead of myself and attempted last night to figure out how I can revive the Metaschema test-suite but it is pretty badly broken for now, as I learned in aj-stein-nist@2e67d27 last night. I will unassign myself for now, but sync up and assist accordingly and support Wendell as-is.

Now in a working branch we have the beginnings of XSpec, with analysis:

https://github.com/wendellpiez/metaschema/tree/issue240-jsonschemaBugA/test-suite/metaschema-xspec/json-schema-gen

The schema marked CORRECTED3 is both correct and complete, testing both the token datatype and applicable enumerations (i.e., any enumerations not to be suppressed by virtue of a narrowing @target or by allow-other='yes'). Instance documents marked 'valid' or 'invalid' demonstrate validations over data for testing. The draft XSpec casts two of the four examples into their intended form. Running the XSpec shows the bugs in the current process.

Next up:

  • discuss: do we have the correct target model for JSON Schema (does the corrected 3 version look okay);
  • complete XSpec; repair bugs;
  • test out
  • return to functional validations for generated schemas (do they validate the correct cross sections of instances)

Looking good so far, just some quick feedback because it is a branch and not a PR yet (note to self: this was reviewed in the above branch at commit wendellpiez@d045cc8):

  • Most look good. I call out the ones that could use improvements below.
  • Sub-schema with "$id" of "#field_oscal-value-testing-mini_constrained-closed" is close but likely needs a slight modification: allOf should be switched to oneOf or you would need to a token-value to match all four enum values of "one", "two", "three", "four" at the same time (because of presumed allow-other being set to no in Metaschema).
  • Sub-schema for "#field_oscal-value-testing-mini_constrained-open" can work with anyOf so long as the source Metaschema implicitly or explicitly resolves to true() for has-oscal-namespace('http://csrc.nist.gov/ns/oscal') does seem to work. I took some time to re-read the JSON Schema spec this evening and the MVP does appear to work correctly.

image

https://gist.github.com/aj-stein-nist/cff37e5580da4cce0d8b689ddd28f486
https://gist.github.com/aj-stein-nist/1b9a14c074980128b9bcb26dc8e070df

I will review again in the morning to make sure I didn't miss anything.

Looking good so far, just some quick feedback because it is a branch and not a PR yet (note to self: this was reviewed in the above branch at commit wendellpiez@d045cc8):

* Most look good. I call out the ones that could use improvements below.

* Sub-schema with `"$id"` of `"#field_oscal-value-testing-mini_constrained-closed"` is close but likely needs a slight modification: `allOf` should be switched to `oneOf` or you would need to a `token-value` to match all four `enum` values of `"one"`, `"two"`, `"three"`, `"four"` at the same time (because of presumed `allow-other` being set to `no` in Metaschema).

* Sub-schema for `"#field_oscal-value-testing-mini_constrained-open"` can work with `anyOf` so long as the source Metaschema implicitly or explicitly resolves to `true()` for `has-oscal-namespace('http://csrc.nist.gov/ns/oscal')` does seem to work. I took some time to re-read the JSON Schema spec this evening and the MVP does appear to work correctly.

I will review again in the morning to make sure I didn't miss anything.

@wendellpiez, ping ping. I know we discussed via Gitter some limited conversation with Dave, but had you looked at my feedback? It appears the branch (at least remote in github.com/wendellpiez) is still as-is, and I want to make sure we get to this eventually. Lemme know!

Talked with Dave and Wendell. Will move forward with stabilizing the unit tests and then push the fix soonish.

* Sub-schema with `"$id"` of `"#field_oscal-value-testing-mini_constrained-closed"` is close but likely needs a slight modification: `allOf` should be switched to `oneOf` or you would need to a `token-value` to match all four `enum` values of `"one"`, `"two"`, `"three"`, `"four"` at the same time (because of presumed `allow-other` being set to `no` in Metaschema).

This should be fixed in Wendell's branch code as discussed.

* Sub-schema for `"#field_oscal-value-testing-mini_constrained-open"` can work with `anyOf` so long as the source Metaschema implicitly or explicitly resolves to `true()` for `has-oscal-namespace('http://csrc.nist.gov/ns/oscal')` does seem to work. I took some time to re-read the JSON Schema spec this evening and the MVP does appear to work correctly.

We will drop constraints in this case, as agreed. (To A.J.: the thing you were proposing you will not do, just drop them, the GitHub Gists are illustrative but it ends here.)

Talked with Dave and Wendell. Will move forward with stabilizing the unit tests and then push the fix soonish.

* Sub-schema with `"$id"` of `"#field_oscal-value-testing-mini_constrained-closed"` is close but likely needs a slight modification: `allOf` should be switched to `oneOf` or you would need to a `token-value` to match all four `enum` values of `"one"`, `"two"`, `"three"`, `"four"` at the same time (because of presumed `allow-other` being set to `no` in Metaschema).

This should be fixed in Wendell's branch code as discussed.

So this turns out not be the case and a minor detail but caused a good deal of miscommunication on my part. I think I more fully understand the schemas in question now. I will finish double-checking the schemas and my take, but @wendellpiez, the example model, the valid JSON examples conforming to those schemas, and the intentionally invalid examples that will not conform do look correct.

I will continue working on aligning the transform and the tests to send to you for review.

I talked with @wendellpiez yesterday and we agreed he can focus on the actual fixes once he works down his work queue and will sync up with me, if not today, when we are back to work (so Monday). I will continue on Friday to attempt a fix and structure the tests, but I will keep him the primary assignee as he will lead the fixes on the transform, unless I can present a solution that works before he comes back to me on this bit.

Moving this to Sprint 61, still there is some more work to do. :-(

Based on our conversation today, the following should guide the use of anyOf or allOf.

  • Use anyOf if the allowed values are open. i.e. allow-other=yes
  • Use allOf if the allowed values are closed. i.e. allow-other=no

Hopefully wrapping this up before the end of the week.