cloudfoundry/log-cache-cli

`cf tail -f` will never display metrics "from the past" (old timestamp)

Closed this issue · 2 comments

Context

We are using log-cache to send metrics from our backing services to the tenants. Right now we retrieve metrics from SQL queries, which are retrieved and emitted intermediately, but also metrics from AWS cloudwatch which have their own timestamp in the past because AWS cloudwatch sends some delay.

Basically, in our case some metrics might be sent with a timestamp of some minutes ago, other metrics from are more recent

Problem

When running cf tail -f <id> it will print the latest 10 metrics, and then only print metrics with a timestamp older than the timestamp of the latest metric displayed:

For instance, we emit the connections with a timestamp of now(), but cpu has always a 2 minutes delay. Running a cf tall -f will never print a new cpu metric, because there is going to always be a connections metric that has a newer timestamp:

$ cf tail 86943ea5-b215-4f03-aa5f-741dc7fb7fe9 -f
Retrieving logs for source 86943ea5-b215-4f03-aa5f-741dc7fb7fe9 as admin...

   2018-06-20T14:19:52.29+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:20:00.00+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE cpu:1.355932 percent
   2018-06-20T14:20:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:21:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:22:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:23:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:24:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:25:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:26:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:27:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:28:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn
   2018-06-20T14:29:18.36+0100 [86943ea5-b215-4f03-aa5f-741dc7fb7fe9/0] GAUGE connections:5.000000 conn

Expected behaviour

cf tail -f should emit metrics as they arrive into log-cache, regardless of the timestamp, or at least if the timestamp is within some accepted time window.

Related

This is a similar problem to the one described here: #32 (comment)

Potential solutions might involve refactor the log-cache api to return a reception timestamp, a message id or similar?

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/158492915

The labels on this github issue will be updated when the story is started.

@keymon Thanks for providing all of the details for this issue.

Based on initial conversations, it seems like it would take a significant amount of re-architecting Log Cache to make it stream data that arrives late. Given that this functionality isn't a product goal for Log Cache at the moment, I think it makes sense to close out this issue.

We'll keep #32 open for now to track the relevant portion of that discussion, as that pertains more to messages that are considered current.