twitter/hraven

Ensure the scan for flowseries endpoint with version is time bound

Closed this issue · 2 comments

The flow series endpoint presently looks for flows belonging to a cluster/user/app/version. It is limited by the number of flows to retrieve. In some cases a version occurs very few times like once or twice. But the scan will look through all runs for the app to see if the version is occurring somewhere.

We would like to add a time bound to this scan. The default can be 30 days to look back and configurable by start and end times.

By adding a time boundary for app runs to the scan, the scan can terminate much sooner instead of going all the way back in time. Default for looking back is about 30 days. So the rest endpoint now looks for N flows that have occurred in this time range and not look back beyond that range for more runs of this app. This time range is configurable and can be controlled by startTime and endTime in the flow rest endpoint that accepts version.

The hbase row key contains the run Id which is a timestamp in milliseconds. The pull request now adds a start row prefix and stop row prefix that includes a timestamp based on now and now - default. That way the scan does not keep scanning all the way back in time for that app and will terminate after the stop row. As mentioned earlier, this is configurable with a proposed default lookback of 30 days.

The other flow end point ( without version ) is not being changed.

merged onto master