/quarkus-observability-app

This application was created to showcase how to configure Logging, Metrics, and Tracing in a Quarkus and collect and manage them using the supported infrastructure of Openshift

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

Quarkus Observability App

1. Introduction

This application was created to showcase how to configure Logging, Metrics, and Tracing in a Quarkus and collect and manage them using the supported infrastructure of Openshift.

The application was built using Quarkus, a Container-First framework for writing Java applications.

Table 1. Used Quarkus extensions
Extension Name Purpose

Micrometer Registry Prometheus

Expose Metrics

Logging JSON

Format Logs in JSON

OpenTelemetry

Distributed Tracing

SmallRye Health

Live and Running endpoints

1.1. Openshift Components

In order to collect the logs, metrics, and traces from our application, we are going to deploy and configure several Openshift components. The installation and configuration of the components are not the focus of this repository, so I will provide links to my other repositories where I have my quickstarts for those components.

Table 2. Openshift Supported Components
Openshift Component Purpose

OpenShift distributed tracing

Collect and display distributed traces. Now, in version 3.0, it is based on the Grafana Tempo project. The 2.x releases were based on Jaeger. Both projects use the OpenTelemetry standard.

User Workload Monitoring

Collect metrics in OpenMetrics format from user workloads and present it in the built-in dashboard. It also allows the creation of alerts based on metrics.

Cluster Logging Operator

Collect, store, and visualize logs from workloads.

1.2. How to start?

Access the Code Quarkus site that will help you to generate the application quickstart with the Quarkus extensions:

Quarkus Application Generator
Figure 1. Quarkus Application Generator

Generate the application and download it as .zip.

2. How it works?

The application is similar to the autogenerated version, but with the following customizations:

  • I’ve added a new endpoint to count something using the Swagger OpenApi library.

  • I’ve used the Micrometer metrics library to generate custom metrics that I will expose in the Prometheus endpoint. I’ve created three new metrics:

    • Gauges measure a value that can increase or decrease over time, like the speedometer on a car.

    • Counters are used to measure values that only increase.

    • Distribution summaries record an observed value, which will be aggregated with other recorded values and stored as a sum

3. How to run it?

3.1. Locally

You can run your application in dev mode that enables live coding using:

mvn compile quarkus:dev

NOTE: Quarkus now ships with a Dev UI, which is available in dev mode only at http://localhost:8080/q/dev/.

3.2. Packaging and running the application

The application can be packaged using:

mvn package

It produces the quarkus-run.jar file in the target/quarkus-app/ directory. Be aware that it’s not an uber-jar as the dependencies are copied into the target/quarkus-app/lib/ directory.

The application is now runnable using java -jar target/quarkus-app/quarkus-run.jar.

If you want to build an uber-jar, execute the following command:

mvn package -Dquarkus.package.type=uber-jar

The application, packaged as an uber-jar, is now runnable using java -jar target/*-runner.jar.

3.3. Shipping it into a Container

Manual steps to generate the container image locally:

# Generate the Native executable
mvn package -Pnative -Dquarkus.native.container-runtime=podman -Dquarkus.native.remote-container-build=true -Dquarkus.container-image.build=true

# Add the executable to a container image
podman build -f src/main/docker/Dockerfile.native -t quarkus/quarkus-observability-app .

# Launch the application
podman run -i --rm -p 8080:8080 quarkus/quarkus-observability-app

4. Deploy on Openshift

4.1. Quarkus App

Deploy the app in a new namespace using the following command:

# Create the project
oc process -f openshift/quarkus-app/10-project.yaml | oc apply -f -


# Create a ConfigMap to mount in the application to configure without rebuilding
oc create configmap app-config --from-file=application.yml=src/main/resources/application-ocp.yml -n quarkus-observability

# Install the application
oc process -f openshift/quarkus-app/20-app.yaml | oc apply -f -

# After that, you can access the Swagger UI using the following link
oc get route app -n quarkus-observability  --template='https://{{ .spec.host }}/q/swagger-ui'

4.2. Red Hat build of OpenTelemetry

Red Hat build of OpenTelemetry product provides support for deploying and managing the OpenTelemetry Collector and simplifying the workload instrumentation. It can receive, process, and forward telemetry data in multiple formats, making it the ideal component for telemetry processing and interoperability between telemetry systems.

OpenTelemetry is made of several components that interconnect to process metrics and traces. The following diagram from this blog will help you to understand the architecture:

Red Hat Build of OpenTelemetry - Architecture
Figure 2. Red Hat Build of OpenTelemetry - Architecture

For more context about OpenTelemetry, I strongly recommend reading the following blogs:

ℹ️
If you struggle with OTEL configuration, please check this redhat-rhosdt-samples repository.
# Install the operator
oc apply -f openshift/ocp-opentelemetry/10-subscription.yaml

4.3. Distributed Tracing

Red Hat OpenShift Distributed Tracing lets you perform distributed tracing, which records the path of a request through various microservices that make up an application.

Tempo is split into several components deployed as different microservices. From a beginner’s point of view, it can be more complex to understand. The following diagram from this blog will help you to better understand the architecture:

Red Hat Distributed Tracing - Architecture
Figure 3. Red Hat Distributed Tracing - Architecture

For more context about DistTracing, I strongly recommend reading the following blogs:

For more information, check the official documentation.

4.3.1. Tracing backend 1: Grafana Tempo

# Create an s3 bucket
./openshift/ocp-distributed-tracing/tempo/aws-create-bucket.sh ./aws-env-vars

# Install the operator
oc apply -f openshift/ocp-distributed-tracing/tempo/10-subscription.yaml

# Deploy Tempo
oc process -f openshift/ocp-distributed-tracing/tempo/20-tempostack.yaml | oc apply -f -

Once you have configured everything, you can access the Metrics tab and show stats retrieved directly from the Traces collected by the OpenTelemetry collector. This is an example of the output:

Red Hat Distributed Tracing - Metrics tab
Figure 4. Red Hat Distributed Tracing - Metrics tab
Dashboards

By default, the Grafana Tempo operator does not configure or provide any Grafana Dashboards for monitoring. Therefore, I have collected the ones provided upstream in this folder: https://github.com/grafana/tempo/tree/main/operations/tempo-mixin-compiled.

4.3.2. Tracing backend 2: Jaeger [DEPRECATED]

# Install the operator
oc apply -f openshift/ocp-distributed-tracing/jaeger/10-subscription.yaml

# Deploy Tempo
oc process -f openshift/ocp-distributed-tracing/jaeger/20-jaeger.yaml | oc apply -f -

4.4. Monitoring

In OpenShift Container Platform 4.14, you can enable monitoring for user-defined projects in addition to the default platform monitoring. You can monitor your own projects in OpenShift Container Platform without the need for an additional monitoring solution.

# Enable user workload monitoring
oc apply -f openshift/ocp-monitoring/10-cm-cluster-monitoring-config.yaml

# Configure the user workload monitoring instance
oc apply -f openshift/ocp-monitoring/11-cm-user-workload-monitoring-config.yaml

# Add Service Monitor to collect metrics from the App
oc process -f openshift/ocp-monitoring/20-service-monitor.yaml | oc apply -f -

For more information, check the official documentation.

ℹ️
If you face issues creating and configuring the Service monitor, you can use this Thoubleshooting guide.

4.4.1. Alerting

Using Openshift Metrics, it is really simple to add alerts based on those Prometheus Metrics:

# Add Alert to monitorize requests to the API
oc process -f openshift/ocp-alerting/10-prometheus-rule.yaml | oc apply -f -

4.4.2. Grafana Dashboards

# Install the Grafana Operator
oc process -f openshift/grafana/10-operator.yaml | oc apply -f -

# Deploy a Grafana Instance
oc process -f openshift/grafana/20-instance.yaml | oc apply -f -

# Create Datasource
oc process -f openshift/grafana/30-datasource.yaml \
    -p BEARER_TOKEN=$(oc get secret $(oc describe sa grafana-sa -n grafana | awk '/Tokens/{ print $2 }') -n grafana --template='{{ .data.token | base64decode }}') \
    | oc apply -f -

# Configure Grafana Dashboard for the quarkus-observability-app
oc process -f openshift/grafana/40-dashboard.yaml \
    -p DASHBOARD_GZIP="$(cat openshift/grafana/quarkus-observability-dashboard.json | gzip | base64 -w0)" \
    -p DASHBOARD_NAME=quarkus-observability-dashboard \
    -p CUSTOM_FOLDER_NAME="Quarkus Observability"  | oc apply -f -

After installing, you can access the Grafana UI and see the following dashboard:

Grafana dashboard
Figure 5. Grafana dashboard

4.5. Logging

The logging subsystem aggregates infrastructure and applications logs from throughout your cluster and stores them in a default log store. The Openshift Logging installation section consists of sections:

  • Installation of the Openshift logging operator. Always needed.

  • Installation of the Loki operator as the logging backend. This is mutually exclusive with section 2).

  • Installation of the ElasticSearch operator as the logging backend. This is mutually exclusive with section 3).

Logging Operator
oc apply -f openshift/ocp-logging/00-subscription.yaml

4.5.1. Migration to the new Logging stack

Currently, the Openshift Logging team decided to move from EFK to Vector+Loki. The original Openshift Logging Stack was split into three products: ElasticSearch ( Log Store and Search), Fluentd (Collection and Transportation), and Kibana (Visualization). Now, there will be only two: Vector (Collection) and Loki (Store).

In order to keep up to date and age better, this repo explores both implementations.

4.5.2. Logging backend 1: Loki

Option 1: Loki installation and deployment
# Install the Loki operator
oc apply -f openshift/ocp-logging/loki/10-operator.yaml

# Create an AWS S3 Bucket to store the logs
./openshift/ocp-logging/loki/aws-create-bucket.sh ./aws-env-vars

# Create the Logging instance
oc process -f openshift/ocp-logging/loki/20-instance.yaml \
    --param-file aws-env-vars --ignore-unknown-parameters=true | oc apply -f -

# Enable the console plugin
# -> This plugin adds the logging view into the 'observe' menu in the OpenShift console. It requires OpenShift 4.10.
oc patch console.operator cluster --type json -p '[{"op": "add", "path": "/spec/plugins", "value": ["logging-view-plugin"]}]'
Loki dashboard
Figure 6. Loki dashboard

4.5.3. Logging backend 2: ElasticSearch [DEPRECATED]

⚠️

As of logging version 5.4.3 the OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. As of logging version 5.6 Fluentd is deprecated and is planned to be removed in a future release.

Option 2: ElasticSearch installation and deployment
# Install the Elastic operator
oc apply -f openshift/ocp-logging/elasticsearch/10-operator.yaml

# Create the Logging instance
oc apply -f openshift/ocp-logging/elasticsearch/20-instance.yaml

After installing and configuring the indexing pattern, you will be able to perform queries for the logs:

Kibana dashboard
Figure 7. Kibana dashboard

4.5.4. Logging backend 3: Third-party

By default, the logging subsystem sends container and infrastructure logs to the default internal log store (That we created in options 1 and 2).

oc get Infrastructure/cluster -ojson | jq .status.infrastructureName
oc process -f openshift/ocp-logging/log-forwarding/cluster-log-forwarder-aws.yaml \
    --param-file aws-env-vars --ignore-unknown-parameters=true \
    -p CLOUDWATCH_GROUP_PREFIX=$(oc get Infrastructure/cluster -o=jsonpath='{.status.infrastructureName}') \
    | oc apply -f -

Now, you can check the logs in Cloudwatch using the following command:

source aws-env-vars
aws --output json logs describe-log-groups --region=$AWS_DEFAULT_REGION

Annex A: Network Policies with Observability

As you may already know, you can define network policies that restrict traffic to pods in your cluster. When the cluster is empty and your applications don’t rely on other Openshift components, this is easy to configure. However, when you add the full observability stack plus extra common services, it can get tricky. That’s why I would like to summarize some of the common NetworkPolicies:

# Here you will deny all traffic except for Routes, Metrics, and webhook requests.
oc process -f openshift/ocp-network-policies/10-basic-network-policies.yaml | oc apply -f -

For other NetworkPolicy configurations, check the official documentation.

Annex B: Tekton Pipelines as Code

Pipelines as code allow to define CI/CD in a file located in git. This file is then used to automatically create a pipeline for a Pull Request or a Push to a branch.

Step 1: Create a GH application

This step automates all the steps in this section of the documentation:

  • Create an application in GitHub with the configuration of the cluster.

  • Create a secret in Openshift with the configuration of the GH App pipelines-as-code-secret.

tkn pac bootstrap
# In the interactive menu, set the application name to `pipelines-as-code-app`

Step 2: Create a Repository CR

This section creates a Repository CR with the configuration of the GitHub application in the destination repository:

tkn pac create repository

Annex C: New image with expiration in Quay

It is possible to use Labels to set the automatic expiration of individual image tags in Quay. In order to test that, I just added a new dockerfile that takes an image as a build argument and labels it with a set expiration time.

podman build -f src/main/docker/Dockerfile.add-expiration \
    --build-arg IMAGE_NAME=quay.io/alopezme/quarkus-observability-app \
    --build-arg IMAGE_TAG=latest-micro \
    --build-arg EXPIRATION_TIME=2h \
    -t quay.io/alopezme/quarkus-observability-app:expiration-test .
Check the results
# Nothing related to expiration:
podman inspect image --format='{{json .Config.Labels}}'  quay.io/alopezme/quarkus-observability-app:latest-micro | jq

# Adds expiration label:
podman inspect image --format='{{json .Config.Labels}}'  quay.io/alopezme/quarkus-observability-app:expiration-test | jq