jacksontj/promxy

Add caching (like trickster)

jacksontj opened this issue ยท 12 comments

They are simply doing a query-level cache with some time alignments (https://github.com/Comcast/trickster/blob/master/handlers.go#L293). Ideally that'd be refactored out of main and we could leverage the same caching. If not we can make our own cache interface etc. and do the same thing.

The main difference being that we could apply the caching to all data queries:

  • /query
  • /query_range
  • /labels

I've put a proposal up on trickster that would enable trickster to more properly handle the caching of changing downstreams. I'd like to have caching in promxy -- so another approach might be using trickster as a library but we'll see. If all else fails I can continue down the path of implementing caching myself -- I'm just hoping to re-use trickster instead of re-implementing it :)

@jacksontj your proposal now seems to have been accepted. Any thoughts on implementing caching for Promxy?

First off, awesome work on this, saved me a rather big headache of converging my metrics from several clusters spread all over the world.

Secondly, some caching would be nice to save on some egress between my secondary clusters and my main cluster.

We are seeing queries for label names taking a significant amount of time, probably due to a large number of time series. Since the set of label names changes should change somewhat infrequently I think it'd be a good use case for caching.

If i understand correctly, cache is not implemented yet ?

@xenofree @jstaffans I'd bet you fairly quickly could set up trickster with promxy as a prometheus backend: https://trickstercache.org/docs/getting-started/configuring/

Either as a docker stack with 2 services, 2 Kubernetes workloads, or even as a Kubernetes sidecar.

It actually seems like Trickster can do the merging of time series and add additional labels as well: https://trickstercache.org/docs/load-balancers/alb/#time-series-merge

@jacksontj Would this render promxy obsolete, or does promxy still provide additional features?

@MikaelElkiaer from my quick look it seems that trickster's functionality has a few differences.

Big differences:

  • the merge functionality simply merges the 2 series assuming the same values in the backing store. If this is some remote_write system (where data is replicated) then the effects would be similar. In a prometheus setup where 2 nodes are scraping the same targets this would create series with 2x the expected points (missing the "anti-affinity" mechanism in promxy).
  • promxy has a lot of logic built in to understand how to break up queries in an aggregateable fashion that is both (1) correct and (2) performant -- reduce volume of data that needs to be shipped. Trickster's approach seems more generic (as it supports a lot of downstreams) and as such will likely have significantly worse performance as an aggregating proxy for prometheus (a case of "the specific tool is better than the generic tool").

In addition to those 2 major differences that jump out, there are a variety of minor differences that exist (trickster has caching, promxy has a variety of options for merging, offsetting data, VM support, etc.).

All that being said, in my installs if I want/need caching today I use trickster + Promxy to achieve this. I had opened this issue because promxy has more context on the downstream requests (e.g. if some prom downstream was dead) and could adjust cache control headers (or an internal cache) with that information -- but short of those situations the trickster + promxy approach has been working great for me.

@jacksontj Thanks for having a look and explaining, it makes very good sense.

So you are using trickster as another proxy in front of promxy? Would you mind going into some details about the advantages of using trickster as a library instead?

The main advantage would be better handling of failure modes. Today with trickster+promxy -- trickster has no idea how "complete" the data that promxy returned. For example; if half the downstreams were down and promxy returned a partial result -- it would be in-cache until TTL (or it gets pushed out of cache). Ideally the cache could get additional context on the data such that it is consistent quicker.

@jacksontj Thanks for your answers. I will definitely weigh the 2 options once I have to deal with this problem again. :)

Edit: 3 options:

  1. Extend Promxy with trickster library support
  2. Use Trickster as an additional proxy in front of Promxy
  3. Use Trickster alone
2Brzi commented

Hi,
After reading this I'm still confused - does promxy support caching or not?