This monitor is designed to expose a HTTP Service endpoint that Prometheus will use to gather information about the state of the cluster's attached EBS volumes.
A volume may be in one of these states attaching
, attached
, detaching
, detached
and is thus reported, for example, in one time slice:
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attaching"} 1.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attached"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detaching"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detached"} 0.0
The next time slice may be
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attaching"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attached"} 1.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detaching"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detached"} 0.0
And the next might be
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attaching"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="attached"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detaching"} 0.0
ebs_volume_state{vol_id="vol-d34db33f",clusterid="testcluster",vol_state="detached"} 1.0
Here, we can see the transition from attaching to attached to detached. Note that the the state record may have missed the detaching
state transition on the way to detached
due to the polling interval.
(Note: These are sample outputs)
When the attaching or detaching state is high (1.0
) for whichever interval we care about we can consider the volume vol_id
to be "stuck."
The query min_over_time(ebs_volume_state{ebs_volume_state="attaching"}[5m]) == 1
can be used to identify volumes which have only been in the attaching
state for the past five minutes. The [5m]
time interval can be changed for other timeframes, perhaps causing a warning at 5 minutes and a critical alert at 10 minutes.
This service requires these read-only IAM roles:
ec2:DescribeInstances
Installation of the exporter is a multi-step process. Step one is to use the provided Makefile to render various templates into OpenShift YAML manifests.
A total of four variables must be provided with make:
AWS_REGION
- The region to make AWS API calls againstAWS_ACCESS_KEY_ID
- The AWS access key IDAWS_SECRET_ACCESS_KEY
- The AWS secret access keyCLUSTERID
- The identifier of the cluster. Only EBS volumes with tagkubernetes.io/cluster/$CLUSTERID
will be checked
Optionally, a different image version can be provided with the IMAGE_VERSION
variable. The defalt is stable
.
Currently these are provided as environment variables to make
.
make all
will render these manifests:
deploy/025_sourcecode.yaml
deploy/030_secrets.yaml
deploy/040_deployment.yaml
Once these have been created the collection of manifests can be applied in the usual fashion (such as oc apply -f
).
The Makefile includes three helpful targets:
clean
- Delete any of the rendered manifest files which the Makefile rendersfilelist
- Echos to the terminal a list of all the YAML files in thedeploy
directoryresourcelist
- Echos to the terminal a list of OpenShift/Kubernetes objects created by the manifests in thedeploy
directory, which may be useful for those wishing to delete the installation of this monitor.
Rules are provided by the openshift/managed-cluster-config repository.