/timescaledb-metrics

Send TimescaleDB policy stats (and other things) as metrics.

Primary LanguageGoMIT LicenseMIT

timescaledb-metrics

Send TimescaleDB policy stats (and other things) as metrics.

Usage

We advise not to pass credentials in configurations, but via Environment or Instance Profiles

env AWS_DEFAULT_REGION="ap-south-1" CLOUDWATCH_NAMESPACE="timescaledb-dev" DATABASE_URL="postgres://dbuser:4c77c8b8-a9e1-c2cd32feaeeb@10.29.152.230:5432/db" ./timescaledb-metrics

Output

Cloudwatch

Motivation

While running timescaleDB, there are a lot of background jobs and policies which need constant observation.

Example of such Jobs are:

  • drop_chunks policy
  • compress_chunks policy
  • reorder policy
  • continuous_aggregate policy

etc.

Example of the stats that can be pulled from TimescaleDB tables.

| entity_name            | job_type             | last_start_on | last_success_on | total_failures |
|------------------------|----------------------|---------------|-----------------|----------------|
| sli_300                | drop_chunks          | 1600262129    | 1600262129      | 0              |
| custom_sli_archive     | reorder              | 1600195293    | 1600195293      | 6              |
| custom_sli_archive     | compress_chunks      | 1600248496    | 1600248496      | 0              |
| custom_sli_archive     | drop_chunks          | 1600259297    | 1600259297      | 0              |
| sli_archive            | reorder              | 1600195323    | 1600195323      | 10             |
| sli_300                | continuous_aggregate | 1600262169    | 1600262170      | 0              |
| sli_archive            | drop_chunks          | 1600259297    | 1600259297      | 0              |
| sli_archive            | compress_chunks      | 1600257782    | 1600257782      | 8              |
| sli_violations_archive | drop_chunks          | 1600248501    | 1600248501      | 0              |

Keeping a manual eye on these is painful and tedious, and by emitting these as metrics we can be alerted when things go wrong.

While some may argue that the interval of these jobs is not as periodic where they qualify to be a metric system, but as the usage increases these policies end up running closer to every 5 minutes. Alerts on such slopes can catch staleness pretty effectively.

Another problem with the default policy_stats is that the job_id is hardly human readable. It's tough to keep a track of an IntegerID since it changes, on any recreation. But that's a minor win.

Components

Only timescaleDB.

Considerations

  • As of now we output to Cloudwatch but it can surely be extended to other metric sources. If you want more metric sources, send a Pull Request or just open a new Issue.

  • The reason to not use go-metrics, is the overall lack of precision of timestamps which Cloudwatch is excellent at.

Prometheus Exporter

You can export timescale db metrics to Prometheus by setting the path of this file as an environment variable which the postgres-exporter recognizes by default.

export PG_EXPORTER_EXTEND_QUERY_PATH=<path_to_queries.yaml>

Or you can just add this to your existing custom queries.yaml

References: https://github.com/wrouesnel/postgres_exporter#adding-new-metrics-via-a-config-file


About Last9

This project is sponsored and maintained by Last9. Last9 builds reliability tools for SRE and DevOps.