This repository contains sets of example resources to be used with a declarative management strategy. Please familiarize yourself with the terminology in that document before reading on.
The purpose of these examples is twofold:
- To act as supporting content for a GitOps series being written for uncontained.io
- To serve as a starting point for establishing a GitOps practice for cluster management
The simple cluster bootstrapping example shows how cluster administrators might begin managing OpenShift clusters using just oc apply
. Each resource in this example carries a common label (config.example.com/name: simple-bootstrap
) that associates it with this project
. In doing this, we can manage the full lifecycle of our resources with a single command.
until oc apply -Rf simple-bootstrap/ --prune -l config.example.com/name=simple-bootstrap; do sleep 2; done
Explanation of the command is below.
The apply
command idempotently ensures that the live configuration is in sync with our configuration files. By adding the -Rf simple-bootstrap/
, we are able to manage an entire directory structure of manifest files.
$ oc apply -Rf simple-bootstrap/
namespace/deleteable created
namespace/namespace-operator created
operatorgroup.operators.coreos.com/namespace-operator created
subscription.operators.coreos.com/namespace-configuration-operator created
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators created
userconfig.redhatcop.redhat.io/sandboxes created
If we run this a second time, we'll see that it still completes successfully, but notice that the action taken to each file has been changed from create
to unchanged
or in some cases configured
.
$ oc apply -Rf simple-bootstrap/
namespace/deleteable configured
namespace/namespace-operator configured
operatorgroup.operators.coreos.com/namespace-operator unchanged
subscription.operators.coreos.com/namespace-configuration-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators unchanged
userconfig.redhatcop.redhat.io/sandboxes created
The --prune
flag allows us to also manage the deletion of live objects by simply deleting the associated file in this repository.
Now, let's remove a namespace and re-run the same command:
$ rm simple-bootstrap/0-namespaces/deleteable.yaml
$ oc apply -Rf simple-bootstrap/ --prune -l config.example.com/name=simple-bootstrap
namespace/namespace-operator configured
operatorgroup.operators.coreos.com/namespace-operator unchanged
subscription.operators.coreos.com/namespace-configuration-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators unchanged
userconfig.redhatcop.redhat.io/sandboxes unchanged
namespace/deleteable pruned
We can see that by deleting the file, the resource gets deleted.
In order to be able to handle pruning of custom resources, we have to customize the set of resource types that we are searching for with our label. To do this, we pass the --prune-whitelist
flag. In order to simplify this, we've written the set of flags that we're handling to a file that we add to the command.
$ oc apply -Rf simple-bootstrap/ --prune -l config.example.com/name=simple-bootstrap $(cat prune-whitelist.txt)
namespace/deleteable configured
namespace/namespace-operator configured
operatorgroup.operators.coreos.com/namespace-operator unchanged
subscription.operators.coreos.com/namespace-configuration-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators unchanged
userconfig.redhatcop.redhat.io/sandboxes created
However, there's one likely hiccup that our workflow needs to be able to handle. The management of operators via the Operator Lifecycel Manager creates a race condition. When a Subscription
and OperatorGroup
resource gets created, it triggers OLM to fetch details about the operator, and install the relevant CustomResourceDefinitions
(CRDs). Until the CRDs have been put to the cluster, an attempt to create a matching CustomResource
will fail, as that resource type doesn't yet exist in the API.
In our case, we are deploying the Namespace Configuration Operator, which provides the UserConfig
resource type. If we try to create both the OperatorGroup
/Subscription
to deploy the operator, and the UserConfig
to invoke it in the same command, we'll get an error:
Error from server (NotFound): error when creating "simple-bootstrap/3-operator-configs/sandbox-userconfig.yaml": the server could not find the requested resource (post userconfigs.redhatcop.redhat.io)
The simplest way to handle this is with a simple retry loop.
$ until oc apply -Rf simple-bootstrap/ --prune -l config.example.com/name=simple-bootstrap $(cat prune-whitelist.txt); do sleep 2; done
namespace/deleteable configured
namespace/namespace-operator configured
operatorgroup.operators.coreos.com/namespace-operator unchanged
subscription.operators.coreos.com/namespace-configuration-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators unchanged
userconfig.redhatcop.redhat.io/sandboxes created
This command will re-run (not a problem since apply
is idempotent) until all resources have been synced to the cluster. Usually this only takes two tries.
Now that we have a repeatable process for managing cluster resources, we can set it up to run automatically as a CronJob
inside the cluster.
By running the workflow locally, we've already created a CronJob
in the cluster-ops
namespace. In order for it to run, it requires a secret be created pointing it to the repository where the cluster configs live.
oc create secret generic gitops-repo --from-literal=url=https://github.com/redhat-cop/declarative-openshift.git --from-literal=ref=master --from-literal=contextDir=simple-bootstrap --from-literal=pruneLabel=config.example.com/name=simple-bootstrap -n cluster-ops
Now, if you wait a few minutes and check the logs in the job pod...
$ oc logs cronjob-gitops-1591666560-4q7f2 -n cluster-ops
Syncing cluster config from https://github.com/redhat-cop/declarative-openshift.git/simple-bootstrap
Cloning into '/tmp/repodir'...
namespace/deleteable configured
namespace/namespace-operator configured
operatorgroup.operators.coreos.com/namespace-operator unchanged
subscription.operators.coreos.com/namespace-configuration-operator unchanged
clusterrolebinding.rbac.authorization.k8s.io/cluster-administrators unchanged
userconfig.redhatcop.redhat.io/sandboxes configured
Voila! Enjoy your automatically drift-controlled cluster!
OpenShift provides for a secure environment by making use of Security Context Constraints to govern the level of access that is granted to a running container. By default, all containers execute using the restricted
SCC. There are circumstances where it may be desired or necessary for a container to make use of an alternate SCC. OpenShift contains several SCC's for a variety of use cases including granting access to resources on the Container Host or access to the Container Host Network.
As a user with elevated access, execute the following commands to view all of the SCC's that are currently defined in the environment:
$ oc get scc
NAME AGE
anyuid 6h45m
hostaccess 6h45m
hostmount-anyuid 6h45m
hostnetwork 6h45m
node-exporter 6h34m
nonroot 6h45m
privileged 6h45m
restricted 6h45m
The most common use case for containers running in OpenShift to make use of an alternate SCC is for the container to use the ID of the user specified in the image instead of a randomly generated ID. The anyuid
SCC provides this functionality and the assets in this exercise will demonstrate how to grant and verify access.
In earlier versions of OpenShift, the preferred method for granting access to an SCC was to make use of a dedicated Service Account to execute the pod and to add the Service Account Directly to the SCC. This caused challenges as the platform evolved over time. The preferred method is to use Role Based Access Controls (RBAC) to declaratively state that a Service Account is able to access to a particular SCC.
By applying the resources in prior sections, the following were applied to the cluster:
- A Namespace called
manage-scc
- A ClusterRole that provides access to the anyuid SCC
- A ServiceAccount that can be used by Pods requiring access to the anyuid SCC
- A RoleBinding in the
manage-scc
namespace that links the ServiceAccount to the ClusterRole - A Job that uses the ServiceAccount to validate it has access to the desired SCC
The key to enabling access to the anyuid SCC is in the allow-anyuid-scc
ClusterRole by specifying access to use
through this verb to the resource name called anyuid
in the securitycontextconstraints
resource in the security.openshift.io
as shown below:
rules:
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- anyuid
The association between the ClusterRole and the ServiceAccount is in the anyuid-scc
RoleBinding.
A verification job has been launched to confirm that it is running using the anyuid
SCC. It accomplishes this task by mounting the Pod annotations to a directory using the Downward API.
List all pods in the manage-scc
Namespace:
$ oc get pods -n manage-scc
NAME READY STATUS RESTARTS AGE
manage-scc-verifier-job-q46rz 0/1 Completed 0 1m
A status of Completed indicates that the job was able to successfully verify that the pod is using the anyuid
SCC. We can confirm this ourself by viewing the openshift.io/scc
annotation:
$ oc get pods -n manage-scc -o jsonpath='{.items[*].metadata.annotations.openshift\.io\/scc}'
In addition, logs from the completed pods can be viewed to confirm that it successfully verified the proper annotation.
$ oc logs -n manage-scc $(oc get pods -n manage-scc -o jsonpath='{.items[*].metadata.name}')
Desired SCC: anyuid
Actual SCC: anyuid
Result Success!
In some cases, a cluster administrator might have a need to apply a patch to a resource that already exists or is owned by some other process. Some use cases of this are:
- Labelling the
default
,kube-system
, or other "out of the box" namespaces - Labelling or tainting nodes not managed by an operator (UPI)
For these cases, we use the Resource Locker Operator to provide a "declarative patch" that will be kept in place by the operator. Building this solution in a declarative way involves creating the following components:
- A manifest for managing a
Namespace
for the Resource Locker Operator - A manifest for installing the Resource Locker Operator
Then, for each patch we want to manage:
- A manifest defining the
ServiceAccount
,ClusterRole
, andRoleBinding
(orClusterRoleBinding
) that will perform the patch - A manifest defining the
ResourceLocker
resource that defines the contents of the patch and the target resource to perform the patch on.
After running this, we can see that our default
namespace now has two labels on it.
$ oc get ns/default -o yaml
apiVersion: v1
kind: Namespace
metadata:
...
labels:
name: default
network.openshift.io/policy-group: ingress
name: default
...
spec:
finalizers:
- kubernetes
status:
phase: Active
Operators are a foundational component of the architecture of OpenShift, and the lifecycle of operators are managed by the Operator Lifeycle Manager (OLM). As illustrated in a portion of the prior examples, an operator managed by the OLM is enabled in one or more namespaces by an OperatorGroup and the intention to install an operator is enabled using a Subscription. A subscription defines the source of the operator including the namespace, catalog and can contain the specific ClusterServiceVersion that is intended to be installed. The OLM will then create an associated InstallPlan which includes the set of resources that wil be installed in association with the operator.
To manage how upgrades are handled when a new version becomes available, operators use an approval strategy which can either be Manual or Automatic (Specified by the installPlanApproval
of a Subscription). If Automatic is chosen, an operator will automatically be upgraded to the latest version when a new version is available. When using the Manual approval strategy, an administrator must manually approve the operator before it is installed.
While the automatic approval strategy offers the simplicity of being able to take advantage of the latest features that an operator can provide, in many cases there is a desire to explicitly specify the version to use without automatically upgrading, thus using the manual approval strategy. Actions that require the intervention of an administrator to approve an operator for it to be deployed contradicts that declarative nature of GitOps. When an operator using the manual approval strategy is approved, the approved
field on the InstallPlan is set to true
.
To replicate the actions that would typically be required by an administrator to approve an operator, a Job can be used. The resource-locker-operator
deployed previously uses the manual approval strategy and is approved by a Job called installplan-approver which will automatically approve an InstallPlan if the CSV matches the desired CSV defined in the Subscription.
Managing the manual approval strategy uses the following resources:
- A set of policies including a ServiceAccount for which the job will run as, a Role that grants access to InstallPlans and Subscriptions along with a RoleBinding which associates the Role to the ServiceAccount.
- The installplan-approver Job that approves the operator
Verify the job completed successfully by executing the following command:
$ oc get pods -n resource-locker-operator -l=job-name=installplan-approver
NAME READY STATUS RESTARTS AGE
installplan-approver-vh9dm 0/1 Completed 0 58m
When using a GitOps tool, such as ArgoCD, the following annotations can be applied to automatically delete an existing job (if found) to avoid a possible conflict when applying resources.
apiVersion: batch/v1
kind: Job
metadata:
annotations:
argocd.argoproj.io/hook: Sync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation