servicebinding/spec

JSONPath expressions in Application Resource Mapping and setting values

baijum opened this issue · 15 comments

Application Resource Mapping specifies JSONPath expressions to indentify containers, envs, volumeMounts, and volumes.

The unstructured API of apimachinery package expects fields to set values for an object. For example, to set volumes:

unstructured.SetNestedSlice(application.Object, volumes, "spec", "template", "spec", "volumes")

I couldn't figure out a way to deterministically resolve the fields from a JSONPath expression. Is that possible? If not, we should reconsider JSONPath expression to specify fields.

@baijum have you looked at https://pkg.go.dev/k8s.io/client-go/util/jsonpath#JSONPath.FindResults

That function returns a slice of results. Should the implementation set value for each of the results?

Should the implementation set value for each of the results?

JSON Path can return multiple matching nodes for a query, the containers, envs and volumeMounts keys all accept multiple json paths, so I'd lean towards treating each matching node as a target to be injected. This behavior is desirable because it allows us to bind into arrays that contain these structures (imagine something like containers that are not quite corev1.Container).

This raises an interesting question as what to do when there is an asymmetry between env and volumeMounts. The path of the volume mount needs to be based on the value of the SERVICE_BINDING_ROOT env var. We have this issue regardless of jsonpath returning one or multiple values.

IMHO, JSONPath is not the right tool for specifying locations in a data structure - it does great job of filtering the values but it does not provide a link between the matched value and its location in the data structure. Hence, updating the value and reflecting that update on the original structure is not something what JSONPath speaks about. It is left to an implementation to try to support it or not.

When there is no match, the result is an empty list - Again, this does not give us any clue where relevant values should be injected if the data structure does not hold them.

In the current spec, Application Resource Mapping defines .envs and .volumes - usually they are optional in application resources. Hence, the provided JSONPath is going to return an empty list, and we are not able to figure where volume/env should be injected.

I would propose that we replace JSONPaths with JSON Pointers - it a well known standard for locating data in JSON structures (e.g. among others kustomize uses it )

to bind into arrays that contain these structures (imagine something like containers that are not quite corev1.Container).

As I understand, there are two primary use cases for Application Resource Mapping.

  1. Project bindings into an intermediate resource. (For eg., runtime-component-operator)
  2. Project bindings into partially PodSpec-able resource (For example cronjob's the template attribute name is jobTemplate)

In both these cases, I don't see a requirement to support an array. BTW, the second example given for cronjob in the spec is not very realistic, because the previous example has a better solution.

In the intermediate resource scenario, repeating the same data inside a nested data structure is not required.

I think dot-separated values of strings would be sufficient to handle the current use cases.

In order to support something like RuntimeComponentSpec, we should introduce a notion of duck-typed PodSpecable resource, something what @baijum pointed out in https://github.com/kubepreset/kubepreset/wiki/Application-Resource-Mapping:

type PodSpecable struct {
	InitContainers []corev1.Container `json:"initContainers,omitempty"`
	Containers     []corev1.Container `json:"containers,omitempty"`
	Volumes        []corev1.Volume    `json:"volumes,omitempty"`
}

The structure can have any other fields, but if it has these, we know what to inject and where. The location to that structure can be specified via a new field:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:                 # string
  generation:           # int64, defined by the Kubernetes control plane
  ...
spec:
  versions:             # []Version
  - version:            # string
    podSpecable:     # jsonpointer or dot separated path

The expression pointing to the location of PodSpecable resource do not need to contain any wildcards, and with that we cover already a huge number of cases. IMHO, it is very unlikely to have containers and volumeMounts living in two very different part of resources, i.e. they have the common parent.

Hence, we can move from the need to support JSONPath, and use simpler standards, e.g. JSONPointer

@pedjak Can you please add a comment with the CR that you'd write for RuntimeComponentSpec? A quick review of the CRD seems to indicate that it's not PodSpecable by this definition.

@nebhale I see that the term podSpecable here is misleading - we are not looking for resources that are exactly of type `PodSpec, we are looking for duck-typed structures, that have some of interesting fields:

type PodSpecable struct {
	InitContainers []corev1.Container `json:"initContainers,omitempty"`
	Containers     []corev1.Container `json:"containers,omitempty"`
	Volumes        []corev1.Volume    `json:"volumes,omitempty"`
}

Everything is optional. The rest of the fields in the found structure are not of interest for the operator.

For a RuntimeComponent CR like:

apiVersion: app.stacks/v1beta1
kind: RuntimeComponent
metadata:
  name: my-app
spec:
  applicationImage: quay.io/my-repo/my-app:1.0
  service:
    type: ClusterIP
    port: 9080
  expose: true
  storage:
    size: 2Gi
    mountPath: "/logs"

CustomApplicationResourceMapping could look like:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:  runtimecomponents.app.stacks
  ...
spec:
  versions:             # []Version
  - version:  '*'
    podSpecable:   .spec

And then it is easy to create volumes collection and an entry in it. If RuntimeComponent CR declares initContainers, then it is easy to inject a proper volumeMount in them.

This works well for CronJob mentioned in the spec, we could have following unique CustomApplicationResourceMapping:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
 name:  cronjobs.batch
spec:
  versions:
  - version: "*"
    podSpecable: .spec.jobTemplate.spec.template.spec

For a RuntimeComponent CR like:

apiVersion: app.stacks/v1beta1
kind: RuntimeComponent
metadata:
  name: my-app
spec:
  applicationImage: quay.io/my-repo/my-app:1.0
  service:
    type: ClusterIP
    port: 9080
  expose: true
  storage:
    size: 2Gi
    mountPath: "/logs"

CustomApplicationResourceMapping could look like:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:  runtimecomponents.app.stacks
  ...
spec:
  versions:             # []Version
  - version:  '*'
    podSpecable:   .spec

How do you bind a service to the container with the image quay.io/my-repo/my-app:1.0 ? Since it's not within []corev1.Container it would seem to be inaccessible.

How do you bind a service to the container with the image

So, it seems that .spec of RuntimeComponent looks like corev1.Container with a twist that it contains initContainer and sideContainer as well. We can add an arbitrary number of fields that could be interesting for the operator:

type PodSpecable struct {
	InitContainers []corev1.Container `json:"initContainers,omitempty"`
	Containers     []corev1.Container `json:"containers,omitempty"`
	Volumes        []corev1.Volume    `json:"volumes,omitempty"`
        VolumeMounts  [][]corev1.VolumeMount     
        Env []corev1.EnvVar
        .
        .
       
}

so in order to bind, the operator locates the structure that should be modified, start looking what fields are available and based on that adds, in this case an entry under volumes, and volumeMounts.

I agree that the current spec'd behavior is under defined and has issues, but I'm not convinced we need to take steps as dramatic as discussed here. Since the ClusterApplicationResourceMapping is a power user feature that his focused on adding support for resource shapes we don't know about, I think we can more emphasis on flexibility over easy of use. Restricting support to resource that have a PodSpec is easy to implement, but ultimately, not as useful.

The essential capabilities we need to bind a service into an application workload are:

  • a single []corev1.Volume
  • for each container:
    • a single []corev1.EnvVar
    • a single []corev1.VolumeMapping

If we have access to a corev1.PodTemplateSpec we can infer everything else and have a richer experaince. But if we require a PodTemplateSpec, then flexibility is lost.

We need the flexibility to find container-esque shapes wherever they may be in the spec; JSONPath is a good fit. We also need the reliability to know within a found container-esque object a firm reference to the env and volumeMounts (JSON Pointer). While I'm not crazy about requiring two distinct grammars, we can explore a limited subset of JSONPath that gives us a JSON Pointer like experiance.

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:   # metav1.ObjectMeta
  ...
spec:
  versions:
  - version:          # string
    containers:         
    - path:           # string(JSONPath)
      name:           # string(JSON Pointer), optional
      env:            # string(JSON Pointer), defaults to "/env"
      volumeMounts:   # string(JSON Pointer), defaults to "/volumeMounts"
    volumes:          # string(JSON Pointer)

For the RuntimeComponent resource, it would look like:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:  runtimecomponents.app.stacks
spec:
  versions:
  - version:  *
    containers:
    - path: .spec
    - path: .spec.initContainers[*]
      name: /name
    - path: .spec.sidecarContainers[*]
      name: /name
    volumes: /spec/volumes

What's interesting about RuntimeComponent is there are two obvious arrays of containers, but there's also a single implicit container in the root. This approach is flexible enough to capture all three.

A traditional PodSpecable resource like Deployment would look like:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:  deployments.apps
spec:
  versions:
  - version:  *
    containers:
    - path: .spec.template.spec.containers[*]
      name: /name
    - path: .spec.template.spec.initContainers[*]
      name: /name
    volumes: /spec/template/spec/volumes

A non-PodSpecable resource like CronJob would look like:

apiVersion: service.binding/v1alpha2
kind: ClusterApplicationResourceMapping
metadata:
  name:  cronjobs.batch
spec:
  versions:
  - version:  *
    containers:
    - path: .spec.jobTemplate.spec.template.spec.containers[*]
      name: /name
    - path: .spec.jobTemplate.spec.template.spec.initContainers[*]
      name: /name
    volumes: /spec/template/spec/volumes

Restricting support to resource that have a PodSpec is easy to implement, but ultimately, not as useful.

This is not a real PodSpec, it is just a duck-typed structure that can have some fields of PodSpec - the fields that we know how to handle. IMHO, containers, volumes, and the other have the same parent, i.e. belong to the same structure - they are not scattered across application CR.

The essential capabilities we need to bind a service into an application workload are:

a single []corev1.Volume
for each container:
    a single []corev1.EnvVar
    a single []corev1.VolumeMapping

With that we assume that all application CR semantic is the same as PodSpec, just allowing that location and name of these fields might be different. What if application CR does not follow that semantic? What if the application CR hold just a reference to binding secret and we want to inject only that. The current abstraction does not allow us that.

We need the flexibility to find container-esque shapes wherever they may be in the spec; JSONPath is a good fit.

JSONPath is for querying data, but not for specifying location. IMHO, containers/pods are all to be found in the same part of the resource, so no need to use the wildcard operator, and thus we really do not need to use JSONPath. For example .spec.containers returns the list of all containers. Similar can be achieved by JSONPointer /spec/container

@pedjak What concerns me about

We can add an arbitrary number of fields that could be interesting for the operator:

is that it requires the specification to change to support these use-cases. Especially if we expect the spec and implementations to be stable, we shouldn't require a change to them whenever a new unexpected CR comes into existence months or years from now.

is that it requires the specification to change to support these use-cases. Especially if we expect the spec and implementations to > be stable, we shouldn't require a change to them whenever a new unexpected CR comes into existence months or years from now.

Sure, I agree - what I meant is that we can come up with some sets of fields we know how to handle and put in the spec.

I pulled together a proof of concept for my proposal above to use JSONPath to discover containers and JSON Pointer to address pod and container level fields.

The implementation works by converting any runtime object into an unstructured form. A mapping definition specifies the location of key elements of the resource necessary for the binding. Using the mapping definition, the unstructured resource is converted into a normalized structured form. That normalized form can then be manipulated to apply the service bindings. Finally, the inverse conversion is applied using the same mapping definition to map the mutated state on to the original object.

For the moment, the only aspect of the service binding behavior I have implemented is the detection and defaulting of the SERVICE_BINDING_ROOT env var. Full spec compliant behavior can be added here.

There are tests that apply the binding, mapping an object to the meta-model and mapping from the meta-model back to an object are tested on both a PodSpecable resource (Deployment) and non-PodSpecable resource (CronJob). Knowledge of these types is limited to the test suite.

There are zero special dependencies. This behavior is implemented using k8s.io/api, k8s.io/apimachinery and k8s.io/client-go. github.com/google/go-cmp is only used by the test suite to diff the expected and actual objects.

Binding a Deployment with a default mapping:

https://github.com/scothis/unstructured-meta-binding/blob/d80f2f9541dab3b3641828c12862b38c1695ec0a/binding_test.go#L24-L114

Binding a CronJob with a custom mapping:

https://github.com/scothis/unstructured-meta-binding/blob/d80f2f9541dab3b3641828c12862b38c1695ec0a/binding_test.go#L115-L226

tl;dr it works