Need to tighten up the schema

Question

Need to tighten up the schema

davaya opened this issue 9 years ago · 12 comments

PROBLEM

The "Conceptual Level Abstract Syntax" for OpenC2 is ambiguous, permitting multiple non-interoperable message formats.

POTENTIAL SOLUTION

The JSON Syntax Alternatives document under Working Documents contains several JSON message formats that could be considered representative of the conceptual syntax, and that set is not exhaustive.

Forum should review the alternatives, propose additional alternatives if I've missed something important, and select one of the existing or proposed formats as the JSON message format. At that point a JSON Schema can be used to validate OpenC2 messages, enhancing both interoperability and security.

Answer 1 · 2016-05-11T02:57:36.000Z

I agree that the abstract syntax has a potential interoperability problem. You've definitely made the case with all of the different JSON "interpretations". There definitely needs to be a tightened schema once you pick your message format. You may want to use XML, JSON, CSV, or LMNOP. And that's fine. But once you pick one, there needs to be only one correct interpretation.

I would suggest a hybrid approach. (Won't we all...?)

First of all, I like the modifiers of 3B that you've moved to top-level keys. The modifiers don't need to be nested inside a modifier key. Bringing them to the top level just gets to the point quicker: "here's the delay", "please respond". The flatness, especially for commands is desirable.

But, I do like the nested approach, 4, especially in some broadcast architectures. The very first key to every command is the action. If an actuator doesn't know how to perform that action, it should ignore the rest of the message. If it does, it digs in and finds out what to do.

So the hybrid would look like this:

{
  "SCAN": {
    "TARGET": ["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}],
    "ACTUATOR": ["network.sensor"],
    "delay": "1h",
    "response": "ack"
  }
}

I have a preference toward named objects vs. positional data. I think that the extra "baggage" that it adds is more than offset by the understanding of the semantics of the message for both human and machine alike. Especially, since we're only talking about a couple of tags, it seems like a minimal price.

Answer 2 · 2016-05-11T03:42:29.000Z

In defense of format 1:

#define ACTION 0
#define TARGET 1
#define ACTUATOR 2
#define MODIFIERS 3

With those well-known constants defined, every program can receive a message and then

  if message[ACTION] == "SCAN":
      process_scan(message)

and if it does not recognize the action the message can be ignored.

Nesting the on-the-wire data does not make it any easier for programs to identify messages of interest and discard the rest. And with symbolic constants the semantics of the fields are just as easy to understand as if the strings had been transmitted. The strings really are baggage - from an information theory perspective they contain zero bits of information because the meaning of each of the fields is already known to the receiver.

At the abstract level, the difference between positional and keyword elements is just the difference between ordered and unordered collections. The four OpenC2 fields are inherently ordered; IMO the syntax should reflect the message semantics as closely as possible.

OpenC2Message ::= SEQUENCE {             -- Ordered, encoded as array
    action    Action;
    target    Target;
    actuator  Actuator OPTIONAL;
    modifiers SET OF Modifier OPTIONAL; }

OpenC2Message ::= SET {                    -- Unordered, encoded as object
    action    Action;
    target    Target;
    actuator  Actuator OPTIONAL;
    modifiers SET OF Modifier OPTIONAL; }

If the fields were unordered, the following definition would be semantically identical to the unordered version above; I don't think that's what we mean ...

OpenC2Message ::= SET {                    -- Unordered, encoded as object
    modifiers SET OF Modifier OPTIONAL;
    target    Target;
    action    Action;
    actuator  Actuator OPTIONAL;
 }

Answer 3 · 2016-05-11T13:17:10.000Z

Comment regarding flat vs hierarchal data structures:
Advise that we go with a flat ordered data structure. A heirarcheal data structure adds a layer of complexity, imposes an additional processing burden on the receiving entity and is going to complicate the the binary encoding when we TLV this and move towards a wire leve protocol.
My 'vote' is an ordered collection and a flat data structure

Answer 4 · 2016-05-19T20:47:47.000Z

Done. The next decision is choosing whether a command that has no modifiers is represented as an empty modifiers object or as undefined/null. I'll assume empty because the code is slightly easier, but if reference implementations use optional and null instead, the schema is easy to switch.

Syntax:

OpenC2Message ::= SEQUENCE {
    action    Action;
    target    Target;
    actuator  Actuator OPTIONAL;
    modifiers SET OF Modifier; }

Target ::= SEQUENCE {
    type = TargetType;
    specifier = SET SIZE(0..1) of Specifier; }

JSON example:

[ "SCAN",
  ["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}],
  null,
  {} ]

At today's meeting I learned that cybox objects are data structures, not just identifier strings, so the syntax of Target will change from this simplistic example.

Answer 5 · 2016-05-20T17:24:57.000Z

@davaya Couple of comments:

If I read the spec correctly, a target consists of a mandatory type field and zero or more specifiers
Likewise, an actuator (if present) consists of a mandatory type field and zero or more specifiers
Question: Why sequences instead of objects with key-value pairs?
- I know sequences are less verbose, but is it really saving us much?
- Tradeoffs are human readability (e.g. in a wireshark capture) and extensibility
  - I know it looks straightforward for simple actions, but I'm worried about people getting lost in long lists of target and actuator specifiers.
  - Sometimes the data type will be the same for both actuator-specifier and target-specifier.
  - Example: DENY TARGET(url, specifier=http://www.google.com) ACTUATOR(DEVICE, specifier=https://192.168.1.2:8080)
  - FWIW, Floodlight actuator specifiers currently are URLs. I can't deny URL targets yet, though.

Also, example of a (complex) target-specifier we'll need to be able to express:
http://cybox.mitre.org/language/version2.1/xsddocs/objects/Network_Connection_Object.html

Answer 6 · 2016-05-20T18:07:54.000Z

On the STIX telecon, Sean raised the issue of the Target abstract syntax vs. JSON example:

Target ::= SEQUENCE {
    type = TargetType;
    specifier = SET SIZE(0..1) of Specifier; }

["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}]

These agree, but the structure of Specifier is not specified in the abstract syntax. Currently there can be either 0 or 1 specifiers, meaning that the specifier is optional, but if it exists it can be of only one type (serial number in this case).

Question to the WG:

for specifiers of a single type (serial number), can multiple devices be targets? (is JSON #1 legal?)
for a single target, can multiple specifier types exist (serial number, IP address)? (is JSON #2 legal?)

#0:  ["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}]

#1:  ["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":["34XR05289", "19XR02345", "72MT84723"]}]

#2:  ["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289",
                      "cybox:DeviceObjectType:IPAddress":"123.14.76.194"}]

Answer 7 · 2016-05-20T20:15:01.000Z

I would agree with David's above post that the following agree

Target ::= SEQUENCE {
    type = TargetType;
    specifier = SET SIZE(0..1) of Specifier; }

["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}]

Though I would state my strong preference for explicit key value pairs over implicit sequences of values.
I think that

["cybox:Device",{"cybox:DeviceObjectType:SerialNumber":"34XR05289"}]

is much too confusing for many people and that the following would be preferable

"target": {"type": "cybox:Device", "specifier": {"cybox:DeviceObjectType:SerialNumber":"34XR05289"}}

showing the optional specifier present or

"target": {"type": "cybox:Device"}

showing the optional specifier not present.

I think anyone looking at the above would understand the structure and the overhead for such clarity is minimal and worth it.

For David's question 1, I would say the answer appears to be a clear yes if you read the target-specifier field description at Abstract Syntax. "The specifier further describes a specific target, a list of targets, or a class of targets."

For question 2, looking at the page and description quoted above I think the answer is less clear but I would read it as no.

Answer 8 · 2016-05-21T14:21:35.000Z

Rather than "if we look at the language specification, the answer is ...", my question was really: "If we are designing a C2 language, what should the language specification say in order to accommodate all COA use cases"? I agree that list of targets of a single address type is desired. I think that multiple address types are also desired, but don't have enough COA domain-specific knowledge to be sure. Einstein is reputed to have said "Everything should be made as simple as possible, but no simpler". What he actually said was "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience". How simple can OpenC2 be made while still being able to adequately represent every action that needs to be taken?

Answer 9 · 2016-05-21T21:23:40.000Z

Proposal for a schema compromise:
Define the OpenC2 schema using Avro (http://avro.apache.org/docs/current/spec.html), which supports both binary and JSON encodings. An Avro record is a complex type consisting of an ordered list of fields. The binary encoding of a record transmits only values, but the JSON encoding a of a record is the same as the JSON encoding of a map type, with key-value pairs transmitted on the wire. As long as the schema preserves the distinction between positional and keyword types I'm less concerned about the serialization. Longer-term, Avro might define both a JSON-DEBUG encoding that converts records to maps for transmission, and a more concise JSON encoding that transmits records in native (values-only) format.

Avro is used in the Kafka messaging fabric (http://docs.confluent.io/, see Data Serialization and Evolution), which appears to be well-developed and industrial strength. It may be worth investigating Kafka for OpenC2 development and demos if we aren't already doing so.

Answer 10 · 2016-05-23T14:36:44.000Z

I would like to close this topic. I suspect that we have enough to build a construct, but I want to make sure that everyone has thier 'final' comments. Please provide additional comments by COB 5/25/2016. I will take an action to build a construct.
I believe that we have converged on a JSON representaion via an ordered list vice key value pairs? Ordered lists will save the overhead and will enable us to take full advantage of CBOR (rfc7049).

Answer 11 · 2016-06-01T12:42:52.000Z

I posted a JSON-syntax document in the openc2-working-group repo that resolves this issue without imposing a particular encoding on all applications. The abstract syntax is regarded as the authoritative definition of OpenC2 commands, and encoding details can be tailored to the environment. Three example JSON encodings are included in the paper: verbose with key-value pairs for everything, concise with records encoded as ordered lists, and minimized with both ordered lists and indexed keys as is done for Javascript minimization.

IMHO the concise representation is easier to read because verbose is too cluttered, but to each his own. Producers can generate either verbose or concise commands and decoders can accept either one based on the first character of the JSON command: "[" or "{". I don't expect minimized encoding to be used for JSON commands, but they could be used as a readable representation of binary-encoded commands for debugging.

Dave

-----Original Message-----
From: Joe Brule [mailto:notifications@github.com]
Sent: Monday, May 23, 2016 10:37 AM
To: OpenC2-org/openc2-working-group
Cc: Kemp, David P.; Mention
Subject: Re: [OpenC2-org/openc2-working-group] Need to tighten up the schema (#7)

I would like to close this topic. I suspect that we have enough to build a construct, but I want to make sure that everyone has thier 'final' comments. Please provide additional comments by COB 5/25/2016. I will take an action to build a construct.

I believe that we have converged on a JSON representaion via an ordered list vice key value pairs? Ordered lists will save the overhead and will enable us to take full advantage of CBOR (rfc7049).

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #7 (comment) https://github.com/notifications/beacon/ASRALLkczRneR5cDEKouYC7yfkNFI2-bks5qEbt-gaJpZM4IbfuN.gif

Answer 12 · 2016-07-14T18:09:57.000Z

This issue was closed at the OpenC2 Forum meeting on 7/14/2016. The JSON representation will be documented and posted to GitHub. It is understood that the representation may still be updated as findings come out from the reference implementations. If there are any other "Issues" that need to be discussed, they should be opened up as new "Issues".