aquasecurity/tracee

New Event Structure

Opened this issue · 37 comments

#2355 changed the primary user experience of Tracee to be event oriented (previously events were considered internal and hidden from the user). Therefore:

  1. The event schema needs to be formalized and stabilized. Since it's no longer internal.
  2. The event structure needs to be generalized. Since events are will now be used for detections, captures and more.

Following is the updated event schema based on the comments below:

  1. timestamp
  2. name
  3. id - machine readable id (integer). Note: current event id isn't good since it is architecture specific
  4. // version - use semver where major is a breaking change in the event (e.g. one of the event's fields under data has been changed or removed), minor is a non breaking change (e.g. a new field was added to the event under data) and patch (e.g. a bug fix). Since this data is static, we may remove this or make optional
  5. // tags - since this data is static, we may remove this or make optional
  6. labels - doesn't exist. For future use.
  7. policies
    1. matched
    2. actions - doesn't exist, for future use - list of actions taken (currently the only action we have is print).
  8. workload
    1. process
      1. executable
        1. path
        2. name - the binary name (basename of the path) - doesn't exist, consider adding (in another issue)
      2. uniqueId - unique id of the process
      3. pid
      4. hostPid
      5. executionTime - time of last exec. Doesn't exist, consider adding (in another issue)
      6. realUser
        1. id
        2. name - doesn't exist, consider adding (in another issue)
      7. user - effective user. Doesn't exist, consider adding (in another issue)
        1. id
        2. name
      8. ancestors - process ancestors array. Only direct parent will be populated by default with the following fields:
        1. uniqueId
        2. pid
        3. hostPid
        4. Other ancestor fields may be populated by threat detection events
      9. thread
        1. startTime
        2. name (aka "comm")
        3. tid
        4. hostTid
        5. capabilities - doesn't exist, consider adding (in another issue)
        6. syscall - the syscall that triggered this event
        7. compat - boolean. moved from flags.compat
        8. userStackTrace - if enabled, will be here
    2. container
      1. id
      2. name
      3. image
        1. id
        2. repoDigest
        3. name
      4. isRunning - boolean. moved from flags
      5. startTime - Timestamp of container start time. Doesn’t exist. Will replace started
      6. pid - entrypoint's pid. Doesn’t exists, consider adding
    3. k8s
      1. pod
        1. name
        2. uid
        3. labels
      2. namespace
        1. name
  9. data
    1. Any relevant field (per-event schema)
    2. returnValue (if relevant will appear here)
  10. threat (if relevant will appear here) - static data about threats (can be omitted)
    1. description
    2. mitre
      1. tactic
        1. name
      2. technique
        1. name
        2. id
    3. severity
  11. triggeredBy (will appear on threat detection events)
    1. name
    2. id
    3. data

We also discussed versioning the event schema, but not including the version with each event, for efficiency.

while this represent the complete event schema, I also think we should try to emit minimal events. For example, I suppose most people won't need the full description with every event, or the full stack trace. We already allow conditionally including some elements (stack trace, syscall, exec-env), and I think we need to expand this control to all event fields. for discussing this I'll create another issue.

@itaysk Do you think we should maybe use the protocol.Event feature for this (see types/protocol)? We could add a ContentType header to differentiate events. Or do we want to stick with trace.Event with split up context fields?

I've had this thought as well in relation to the recent Metadata field we've added for the finding events, it could've fit into either the protocol.EventHeader or that Findings can have a different payload body, but be considered an event through the protocol.

Actually I'm not very familiar with how protocol is used now. I thought it's supposed to become obsolete in the unified binary approach. Can you elaborate on your proposal or perhaps add an example?

The protocol is more a way to differentiate different payloads going into the rules engine.
For example in CNDR it is used to define the ability to receive events sent directly from CNDR into the signature (for example to set data storage in the signature).
The same way the protocol can define a different payload for signature inputs, it can be used to define different event payloads with different shapes (for example, before we changed detections to be trace.Event we could have defined it it to be a different payload in the protocol.Event instead).
I am not sure if we want to currently support different payloads for events going into the engine (since the engine is now supposed to be internal).

May I suggest we move the event definition to a proto file while we're at it?
Should make it easier to integrate tracee into gRPC systems later.

The protocol is more a way to differentiate different payloads going into the rules engine.
For example in CNDR it is used to define the ability to receive events sent directly from CNDR into the signature (for example to set data storage in the signature).
The same way the protocol can define a different payload for signature inputs, it can be used to define different event payloads with different shapes (for example, before we changed detections to be trace.Event we could have defined it it to be a different payload in the protocol.Event instead).
I am not sure if we want to currently support different payloads for events going into the engine (since the engine is now supposed to be internal).

I'm still not sure there's a use case for this, or I didn't fully understand it, but regardless seems like we reached the same conclusion that the "rule engine" is gone now.

+1 for proto

Has the event type serialization, using protobufs, ever been tested? I'm particularly concerned about:

Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.

I mean, we will not serialize eBPF events, for sure, but if we're considering converting the Tracee type to protobuf, in a high rate, coming from an external source, then we should do some measurement before taking the decision, IMO.

Has the event type serialization, using protobufs, ever been tested? I'm particularly concerned about:

Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.

I've actually written a PR (#2070) once where I added a ebpf -> rules serialization with protobuf. From my measurement in that PR, protobuf serialization was quicker than both json and gob ONLY if the struct was initially in the protobuf form already - conversion from trace.Event to proto.Event was the overhead in that printer format.
When taking conversion time into consideration it was quicker than both.

Good to know that.

I updated the struct to have process related context grouped together (pids, tids, comm, star time, namespaces, cgroup, uid).
Also moved processorId to be part of the context

Adding this here for future reference and consideration: https://www.elastic.co/blog/ecs-elastic-common-schema-otel-opentelemetry-faq

After talking with @yanivagman we thought about some slight changes:

  1. rename args to fields
  2. rename kubernetes to pod (and remove pod prefix from internal fields)
  3. remove parent information from process intro:
  4. new ancestors field which is an array of process (not all fields will be implemented so need to omitEmpty
  5. move flags.containerStarted to container.started
  6. move flags.isCompat to process.compat

TBD:

  1. severity - location and name
  2. version - maybe move to root

I've updated the description to reflect this

Let's also take the opportunity to only expose the fields mentioned here to the user. Any other (internal) fields should not be part of the trace.Event.
We can do this by embedding the trace.Event struct into an internal struct used by the pipeline only where we can also add some extra fields (e.g. matchedActions, matchedPolicies bitmap, cgroup id, etc.)

More detailed struct with some comments added:

  1. timestamp
  2. id - better if this was an integer, but we also need to keep backwards compatibility with e.g. TRC-XXX ids. Maybe call this differently so we can add integer ID in the future?
  3. name
  4. metadata
    1. description
    2. severity (or priority for non signature events?)
    3. tags
    4. version (this version describes the event fields version, and not the schema of the event structure)
    5. misc - map[string]interface{}
  5. context
    1. process - all process related context
      1. executionTime - doesn't exist, consider adding
      2. name
      3. id
      4. namespaceId
      5. userId
      6. thread
        1. startTime
        2. id
        3. namespaceId
        4. mountNamespaceId - consider removing
        5. pidNamespaceId - consider removing
        6. utsName - consider removing
        7. syscall - moved here
        8. stackTrace - if enabled, will be here
        9. compat - moved from flags.compat
    2. ancestors - array of all process ancestors, in a structure similar to process. Element 0 is parent.
    3. container - all container related context
      1. id
      2. name
      3. image
      4. imageDigest
      5. started - moved from flags - consider setting this as a timestamp of container start time and not boolean
    4. pod - all pod related context (from kubernetes)
      1. name
      2. namespace
      3. uid
      4. sandbox
    5. processorId - moved here
  6. fields - renamed from args (or should we call it data)?
    1. every event attribute
    2. returnValue (if relevant will appear here)
  7. matchedPolicies

id - better if this was an integer

why?

executionTime - doesn't exist, consider adding

agree, but let's discuss in a separate issue as it's about adding an entirely new feature? (and won't break the event structure)

thread

what's the motivation for adding this layer?

mountNamespaceId - consider removing
pidNamespaceId - consider removing
utsName - consider removing

+1

ancestors

since we're discussing the "event structure", having a field in the root implies it's an "event" field. In this case, I'm internally reading this as "event ancestors" but the meaning is "process ancestors". should we relocate it under process or prefix it with process to clarify?

started - moved from flags - consider setting this as a timestamp of container start time and not boolean

agree, but let's discuss in a separate issue as it's about adding an entirely new feature?

pod - all pod related context (from kubernetes)

if this is a kubernetes thing, should we add kubernetes to the name? if this is not, are we sure the structure will work for other pod implementation?

sandbox

What does this mean? from reading the code I think I can gather that it is looking at container lable io.kubernetes.docker.type==sandbox but I don't understand what/why/how it's related to kubernetes pod. @NDStrahilevitz can you please explain?

matchedPolicies

This is the only field in the event that is Tracee-specific, and not meaningful by itself. For example, if I'm streaming all events to splunk and then someone else at another time sees this event there, all other field would make sense since they describe what happened, but to understand this field I need to understand trace and how it was configured started.
I won't turn this into a debate about the usefulness of this event, but at least I'd suggest to prefix with "tracee". or even better, if we have intentions to add more tracee-specific info int the future (#3153) then better to put it under a "tracee" level.

id - better if this was an integer

why?

Makes it easier and more efficient to process Tracee's events in later stages.
Currently, both id and name are strings, which is redundant (yet required for backwards compatibility with already written signatures, TRC-XXX)

executionTime - doesn't exist, consider adding

agree, but let's discuss in a separate issue as it's about adding an entirely new feature? (and won't break the event structure)

There was no intention to do it now, just adding some fields which can be useful to have in the struct. We can open a seperate discussion.

thread

what's the motivation for adding this layer?

Grouping fields which are related to the context of the thread and not specific to the process (can be different between different threads of the same process)

ancestors

since we're discussing the "event structure", having a field in the root implies it's an "event" field. In this case, I'm internally reading this as "event ancestors" but the meaning is "process ancestors". should we relocate it under process or prefix it with process to clarify?

Agree. Let's move under process then.

started - moved from flags - consider setting this as a timestamp of container start time and not boolean

agree, but let's discuss in a separate issue as it's about adding an entirely new feature?

Ditto

pod - all pod related context (from kubernetes)

if this is a kubernetes thing, should we add kubernetes to the name? if this is not, are we sure the structure will work for other pod implementation?

Better to make it generic and not just for k8s. For fields that are k8s specific we can add a prefix

matchedPolicies

This is the only field in the event that is Tracee-specific, and not meaningful by itself. For example, if I'm streaming all events to splunk and then someone else at another time sees this event there, all other field would make sense since they describe what happened, but to understand this field I need to understand trace and how it was configured started. I won't turn this into a debate about the usefulness of this event, but at least I'd suggest to prefix with "tracee". or even better, if we have intentions to add more tracee-specific info int the future (#3153) then better to put it under a "tracee" level.

Agree. Let's add "tracee" level then

So the modified struct will look like this (added a few future suggestions as well):

  1. timestamp
  2. name
  3. id - machine readable id (integer). Note: current event id isn't good since it is architecture specific
  4. metadata
    1. tags
    2. dataVersion (this version describes the event fields version, and not the schema of the event structure)
    3. detection
      1. name - keep backwards compatibility with e.g. TRC-XXX ids. Consider removing and use id instead.
      2. severity
      3. mitreCategory
  5. context
    1. process
      1. executionTime - doesn't exist, consider adding (in another issue)
      2. binary - binary path used to execute the process. Doesn't exist, consider adding (in another issue)
      3. pid
      4. namespacePid
      5. userId
      6. userName - doesn't exist, consider adding (in another issue)
      7. ancestors - array of all process ancestors, where each entry has the above process fields. Currently only populate element 0, which is the parent process.
      8. thread
        1. startTime
        2. name (aka "comm")
        3. tid
        4. namespaceTid
        5. capabilities - doesn't exist, consider adding (in another issue)
        6. mountNamespaceId - consider removing
        7. pidNamespaceId - consider removing
        8. utsName - consider removing
        9. syscall - moved here
        10. compat - moved from flags.compat
        11. userStackTrace - if enabled, will be here
    2. container
      1. id
      2. name
      3. image
        1. id
        2. name
        3. digest
      4. started - moved from flags - consider setting this as a timestamp of container start time and not boolean
    3. pod
      1. name
      2. namespace
      3. uid
      4. sandbox - consider renaming
  6. data
    1. per-event schema of relevant fields (e.g. argument for system calls)
    2. returnValue (if relevant will appear here)
  7. tracee
    1. matchedPolicies

To avoid breaking clients we want to support the old event structure as a printer, so give users more time to migrate to the new structure. So we will have a json-v1printer which is the old format, and json-v2 (aka "json" printer) with the new format.

Plan for migrating ParsedArgs list to schema

As part of the issue an internal PipelineObject will be added to tracee, whose fields will be (roughly) event, and pipelineData.

The following flow should run if parse-args is enabled:

  1. The ParsedArgs list will be moved into the pipelineData
  2. When piping the event into the signature engine (if in the new binary) or in the sink stage (legacy mode), the data field will be swapped with the ParsedArgs.
    CAVEAT: With the following logic, signatures supporting parsed arguments can not run alongside signatures not supporting parsed arguments. In practical terms that means the open source rego and golang signatures can not run at the same time.

@yanivagman

@NDStrahilevitz parsing the args (meaning, translating from machine-readable into user-readable form) is a static operation no? it could even be done by the printer or the consumer. why do we need to include it in the pipeline?

@NDStrahilevitz parsing the args (meaning, translating from machine-readable into user-readable form) is a static operation no? it could even be done by the printer or the consumer. why do we need to include it in the pipeline?

Ideally that would be the case, and it was when tracee-ebpf was standalone. This allowed parsing arguments to be used as an implicit dependency by signature authors. If you take a look at all our open source signatures, you can see that they rely on the arguments parsed forms and not their originals. Furthermore, rego signatures make fixing this even harder, since while golang signatures can at least use the libbpfgo constants for readable logic, rego signatures don't have that option, making argument parsing practically required for them.

Following is a revised structure for Tracee event.

Changes:

  • Removed metadata section
  • Updated some fields names to match ECS/Opentelemetry schema
  • threat field group was added under the data section in case of a threat detection event
  1. timestamp
  2. name
  3. id - machine readable id (integer). Note: current event id isn't good since it is architecture specific
  4. // version - use semver where major is a breaking change in the event (e.g. one of the event's fields under data has been changed or removed), minor is a non breaking change (e.g. a new field was added to the event under data) and patch (e.g. a bug fix). Since this data is static, we may remove this or make optional
  5. // tags - since this data is static, we may remove this or make optional
  6. labels - doesn't exist. For future use.
  7. policies
    1. matched
    2. actions - for future use (currently the only action we have is print)
  8. context (like resource in open telemetry, but structured - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#log-and-event-record-definition).
    1. process (see also https://opentelemetry.io/docs/specs/semconv/resource/process/ and https://www.elastic.co/guide/en/ecs/8.9/ecs-process.html)
      1. executionTime - time of last exec. Doesn't exist, consider adding (in another issue)
      2. executable
        1. path - binary path used to execute the process. Doesn't exist, consider adding (in another issue)
        2. name
      3. pid
      4. namespacePid - should we name it vpid as in the kernel?
      5. entityId - unique id of the process, as defined at [process.entity_id](https://www.elastic.co/guide/en/ecs/8.9/ecs-process.html#field-process-entity-id)
      6. realUser
        1. id
        2. name - doesn't exist, consider adding (in another issue)
      7. user - effective user. Doesn't exist, consider adding (in another issue)
        1. id
        2. name
      8. parent - nested process ancestors. Only the following fields will be populated by default:
        1. pid
        2. entityId
        3. executable
        4. start we may want to include this filed only for direct parent
        5. parent - nested process ancestors (won't be populated, unless required, e.g. on threat detection)
      9. thread
        1. start
        2. name (aka "comm")
        3. tid
        4. namespaceTid - should we name it vtid (in the kernel this is called vpid)?
        5. capabilities - doesn't exist, consider adding (in another issue)
        6. syscall - moved here
        7. compat - boolean. moved from flags.compat
        8. userStackTrace - if enabled, will be here
    2. container (see also https://opentelemetry.io/docs/specs/semconv/resource/container/)
      1. id
      2. name
      3. image
        1. id
        2. repoDigest
        3. name
      4. started - boolean. moved from flags
      5. start - Timestamp of container start time. Doesn’t exist. Will replace started
      6. pid - entrypoint's pid. Doesn’t exists, consider adding
    3. k8s (see also https://opentelemetry.io/docs/specs/semconv/resource/k8s/)
      1. pod
        1. name
        2. uid
        3. labels
      2. namespace
        1. name
  9. data (like attributes in open telemetry, but structured - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#log-and-event-record-definition).
    1. Any relevant field (per-event schema)
    2. returnValue (if relevant will appear here)
    3. triggerEvent (will appear on thread detection events)
      1. name
      2. id
      3. data
  10. threat (if relevant will appear here) - static data about threats (can be omitted)
    1. description
    2. mitreTactic
      1. name
    3. mitreTechnique
      1. id - the ID of the technique, as could be found in the [cti json]
      2. externalId - the ID given in the mitre website
      3. name
    4. severity

This is what I think will match our needs for the threats field:

  1. threat (if relevant will appear here) - (see also https://www.elastic.co/guide/en/ecs/8.9/ecs-threat.html)
    1. description - detailed description of the threat
    2. tactic
      3. id
      4. name
    3. technique
      6. id - the ID of the technique, as could be found in the cti json
      7. external_id - the ID given in the mitre website
      8. name
    4. kubernetes_technique - see Matrix ATT&CK for Kubernetes from Microsoft
      10. name
    5. indicator (or another name) - a dictionary for threat related data
    6. severity (note: in ecs, this field is not defined here but under event - but do we want it common to all events?)
    7. triggeringEvent (can this be under enrichments? better name?)
      14. name
      15. id
      16. data

I am also think that using the "related" field of the ECS can do good for keeping the IOCs in one place

Moved it to next release, the event structure is finished and merged, now it is missing integrating it on tracee internals.

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)
  2. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)
  3. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

The data we need and is not in context currently:

  • under process section missing full process commandline
  • under k8s pod section missing deployment and type
  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source.
This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

  • under k8s pod section missing deployment and type

@itaysk we should be able to add these through container labels like other kubernetes data we already get in container enrichment. I can open an issue for this if agreed.

This is available in the sched_process_exec event and (I think) in the process tree and its data source.

When process tree was being created there was a specific discussion about COMM versus BINARY PATH and where the info should go (https://github.com/aquasecurity/tracee/pull/3364/files#diff-773e2917cb050cc42ce31d36b08db6c9e3da89ab6dff75f8a9b3eba5171316d3R12).

Alon and I agreed that COMM would be part of the TaskInfo structure (for the Process or Thread) and the Binary Path would be part of the File Info structure (for the Binary and the Interpreter of the BInary).

The COMM (Process Name for the process tree) does not pick the args (its basically procfs comm field), that would come together with argv array (which is an argument for the exec event).

We can introduce COMM + ARGS in the process tree if needed (or something else), no problem.

Hey @mcherny, thank you for the questions:

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)

Do you have an example of how this will be used? Because the Threat information is part of specific even (behaviour events), I'm considering this should be a part of the event definition as optional, and not have its own API. Thoughts?

  1. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)

We didn't add those because they are internal to Aqua, our plan was to actually remove all together for opensource signatures, and let the internal project handle the translation between those ids, and tracee ids.

  1. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

Agree! Right now the ids are stable for base events but dynamic for signatures, I need to look into how we can make it always stable.

  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source. This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

In general, if someone needs info on something on each event of its kind, for a particular usecase, a signature/derived event is probably called for.

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)

That's why we wrote:

threat (if relevant will appear here) - static data about threats (can be omitted)

There is an advantage to include this information in the OSS project when a threat is detected (not so frequent) so user can get information about the threat without consulting the documentation

  1. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)

In continuation to the first point, this is also static data and can be added to the threat section by an internal mapping if required by some project

  1. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

Agree. We have an old issue opened for that #1098

  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source. This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

Sounds like a reasonable addition to tracee that should be optional. We should discuss such additions in a separate issue and keep this issue for event structure to support the already existing features of tracee

I've been doing some work in the capture area. Considering we want to merge it into the "everything is an event" scheme at some point, shouldn't we also include an artifact field in the structure (I recall @yanivagman making this sort of suggestion some time ago)? I assume that otherwise, each capture will be its own event.

I've been doing some work in the capture area. Considering we want to merge it into the "everything is an event" scheme at some point, shouldn't we also include an artifact field in the structure (I recall @yanivagman making this sort of suggestion some time ago)? I assume that otherwise, each capture will be its own event.

Eventually capture will be an action taken by some event. In that case, the data about it will be part of policies->actions

Following is the updated event schema based on the comments above:

  1. timestamp
  2. name
  3. id - machine readable id (integer). Note: current event id isn't good since it is architecture specific
  4. // version - use semver where major is a breaking change in the event (e.g. one of the event's fields under data has been changed or removed), minor is a non breaking change (e.g. a new field was added to the event under data) and patch (e.g. a bug fix). Since this data is static, we may remove this or make optional
  5. // tags - since this data is static, we may remove this or make optional
  6. labels - doesn't exist. For future use.
  7. policies
    1. matched
    2. actions - doesn't exist, for future use - list of actions taken (currently the only action we have is print).
  8. workload
    1. process
      1. executable
        1. path
        2. name - the binary name (basename of the path) - doesn't exist, consider adding (in another issue)
      2. uniqueId - unique id of the process
      3. pid
      4. hostPid
      5. executionTime - time of last exec. Doesn't exist, consider adding (in another issue)
      6. realUser
        1. id
        2. name - doesn't exist, consider adding (in another issue)
      7. user - effective user. Doesn't exist, consider adding (in another issue)
        1. id
        2. name
      8. ancestors - process ancestors array. Only direct parent will be populated by default with the following fields:
        1. uniqueId
        2. pid
        3. hostPid
        4. Other ancestor fields may be populated by threat detection events
      9. thread
        1. startTime
        2. name (aka "comm")
        3. tid
        4. hostTid
        5. capabilities - doesn't exist, consider adding (in another issue)
        6. syscall - the syscall that triggered this event
        7. compat - boolean. moved from flags.compat
        8. userStackTrace - if enabled, will be here
    2. container
      1. id
      2. name
      3. image
        1. id
        2. repoDigest
        3. name
      4. isRunning - boolean. moved from flags
      5. startTime - Timestamp of container start time. Doesn’t exist. Will replace started
      6. pid - entrypoint's pid. Doesn’t exists, consider adding
    3. k8s
      1. pod
        1. name
        2. uid
        3. labels
      2. namespace
        1. name
  9. data
    1. Any relevant field (per-event schema)
    2. returnValue (if relevant will appear here)
  10. threat (if relevant will appear here) - static data about threats (can be omitted)
    1. description
    2. mitre
      1. tactic
        1. name
      2. technique
        1. name
        2. id
    3. severity
  11. triggeredBy (will appear on threat detection events)
    1. name
    2. id
    3. data

So we'll keep with triggeredBy outside of event data?