Add discovery/scrape-time service selection

Question

Add discovery/scrape-time service selection

SuperQ opened this issue 3 years ago · 20 comments

With Prometheus 2.28, there is now a generic http service discovery.

The exporter can now produce an API output that lists all of the available services so that they can be scraped independently. This improves the performance of ingestion by spreading it out over time and allowing Prometheus to ingest the data over multiple target threads.

On the Prometheus side, you would configure the job like this:

scrape_configs:
- job_name: fastly
  metrics_path: /fastly
  http_sd_configs:
  - url: http://fastly-exporter:8080/sd
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: fastly-exporer:8080

The /service-discovery endpoint would output json like this:

[
  { 
    "targets": [
      "<Service ID 1>",
      "<Service ID 2>",
      "<Service ID 3>",
      "<Service ID ...>"
    ]
  }
]

The relabel_config would then produce exporter URLs like /fastly?service=<Service ID 1>.

peterbourgon commented 3 years ago

→ #73

Answer 1 · 2021-07-22T09:13:45.000Z

This is meant as an additional improvement on #31. Possibly a replacement, depending on where the bottlenecks are.

CC @neufeldtech

Answer 2 · 2021-07-22T11:59:56.000Z

Super cool.

Answer 3 · 2021-07-31T06:37:05.000Z

So wait, given /discovery returns

[
  { "targets": ["A", "B", "C"] }
]

there's no absolute or explicit URL mapping to each of A, B, and C — I would just assert in the exporter (or a flag or something) that they map to e.g. /service/{A, B, C} which the scraping Prometheus would have to be configured to match a priori. Is that right?

Answer 4 · 2021-07-31T07:34:43.000Z

Yes, it needs to be re-mapped by a relabel config in Prometheus no matter what. The discovery injects the target list items into the __address__ meta-label in discovery. Prometheus doesn't treat those as URLs, it treats them as host:port pairs.

Answer 5 · 2021-09-12T19:36:07.000Z

@SuperQ The bottleneck is parsing the JSON from the Fastly API. By default, the exporter fetches all services available to the provided token on startup, and launches a goroutine per service to poll the API and update the relevant metrics. The -service* flags are used to control the set of services we poll, so I think that means we need to keep them. Though maybe I could play some trick — could I lazy-launch the polling goroutine for a given service only once the exporter received a request for that service? It would mean the first scrape either had a 1s+ delay, or returned an empty response; would either of those be acceptable?

Answer 6 · 2021-09-12T21:39:36.000Z

This is why readiness checks are a thing. Discovery can just wait for the readiness endpoint to say OK.

Answer 7 · 2021-09-12T22:09:52.000Z

Discovery can just wait for the readiness endpoint to say OK.

Not sure how that would work. Is readiness per-SD-target? I'm suggesting that the exporter not poll a service ID unless/until it receives a scrape request for that service ID from Prometheus.

Answer 8 · 2021-09-13T06:55:23.000Z

The exporter shouldn't expose a newly added service to the discovery output until it has valid API connection.

Answer 9 · 2021-09-13T14:43:26.000Z

Right, but the only meaningful optimization is to try to avoid polling the API and parsing the large JSON responses if we can somehow. For example, if the user starts the exporter with a configuration that makes 10 services "visible" but only has Prometheus scrape 2 of them, it would be ideal if we didn't poll/parse/present metrics for the other 8. The only way I can think of to accomplish that is to "lazy load" the per-service polling goroutines. Is that feasible? Sounds like no. That's fine.

Answer 10 · 2021-09-13T17:58:45.000Z

I don't think that's feasible. I don't see a lot of of users doing selective filtering without configuring the -service flag.

Answer 11 · 2021-09-20T00:25:35.000Z

Is there a canonical example of an exporter that supports this new capability?

Answer 12 · 2021-09-20T07:14:13.000Z

Not that I'm aware of, this would be the first one.

Answer 13 · 2021-09-25T16:24:59.000Z

So this is an interesting design conundrum. I think the right way to think about this is that we're adding /metrics endpoints that are backed by a kind of pseudo-registry: something that takes the main registry and filters based on a label. I guess you could do that by wrapping the Registry with something that takes a service ID and yields a custom Gatherer? Or am I looking at this the wrong way?

Answer 14 · 2021-09-25T16:32:48.000Z

Ah! Huh.

So the other angle would be something like maintaining a separate gen.Metrics/prometheus.Registry for each service. That makes per-service /metrics endpoints easy. Could you then just use a prometheus.Gatherers to abstract over all of them to serve the existing /metrics endpoint? Would that be efficient/effective?

Answer 15 · 2021-09-27T12:04:46.000Z

Yes, setting up a separate registry per-service makes the most sense to me.

Although, typically, an exporter like this would manage the data internally. You implement your own prometheus.Collect(ch) to send data to the Prometheus via the interface. We do something like this in the node_exporter.

Answer 16 · 2021-09-27T17:41:26.000Z

Ah, @SuperQ, what would the Prometheus config look like for this? Especially whatever relabel config would be necessary.

Answer 17 · 2021-09-27T19:30:58.000Z

The scrape config and relabel is in the issue summary.

Answer 18 · 2021-09-27T19:45:11.000Z

I tested this out locally, looks really good. It even works well with the modulo sharding so you get the best of both worlds, horizontal sharding and automatic discovery.

Answer 19 · 2021-09-27T19:54:19.000Z

the issue summary

Thank you 🤦