jeffpierce/cassabon

Do not allow two rollups to share a table

Closed this issue · 6 comments

Rollup data for a path is written to a table whose name is determined form the retention period. This means that for any rollup definition such as the following, no two entries can have exactly the same retention period:

rollups:
    default:
        retention:
            - 6s:6h
            - 1m:7d
            - 1h:30d  <<< This goes into the 30-day table
            - 6h:30d  <<< So does this, rendering the data useless
            - 6h:365d
        aggregation: average

Proposed solution: Flag a warning when loading the configuration, and drop the definition (as we do for other such errors).

Which definition do we drop?

I'd rather call it a config loading error and go fatal with that reason.

We don't do any fatal errors during rollup config parsing. So far, we check (and skip) the following:

G.Log.System.LogWarn("Invalid aggregation method for \"%s\": %s", expression, v.Aggregation)
G.Log.System.LogWarn("Malformed regular expression for \"%s\": %s", expression, err.Error())
G.Log.System.LogWarn("Malformed definition for \"%s\": %s", expression, s)
G.Log.System.LogWarn("Malformed window for \"%s\": %s %s", expression, s, couplet[0])
G.Log.System.LogWarn("Duration less than minimum 1 second for \"%s\": %v", expression, window)
G.Log.System.LogWarn("Malformed retention for \"%s\": %s %s", expression, s, couplet[1])
G.Log.System.LogWarn(
                            "Next duration is not a multiple for \"%s\": %v %% %v remainder is %v",
                            expression, v.Window, shortestDuration, remainder)
G.Log.System.LogWarn("Rollup expression rejected due to previous errors: \"%s\"", expression)

These errors go to the log, which unfortunately we probably aren't monitoring.

I suppose we could set a flag if anything gets rejected during the parse, and then at the end issue a fatal error; that way it's impossible to ignore a bad config and send data to "default" when it shouldn't go there.

Well, we should definitely die if default is bad or undefined.

I suppose we could have a "strict" option to die if non-default paths are malformed, and go on letting stats fall through to default.

That might be a user feedback thing once people besides us are using it.

"Strict" option is a good idea, probably defaulting to "on".

In non-strict, we could synthesize a default "default" so that Cassabon works no matter how badly the config is hosed.

Works for me.

10s:1h
1m:30d

...seems reasonable for a default.

Resolved by #84