cespare/cron

Support Jenkins-style H (hash) schedules

Closed this issue · 1 comments

We should support Jenkins-style cron expressions where H may be given instead of a minute, hour, DoM, month, or DoW. H ("hash") means "pick a random value". But it's randomized according to a fixed seed provided by the user, so that a particular kind of job can consistently use the same value. (For instance the seed might be a job name or a database ID.)

Here is the full Jenkins documentation for its own cron syntax:

Expand
This field follows the syntax of cron (with minor differences). Specifically,
each line consists of 5 fields separated by TAB or whitespace:

MINUTE HOUR DOM MONTH DOW
MINUTE	Minutes within the hour (0–59)
HOUR	The hour of the day (0–23)
DOM	The day of the month (1–31)
MONTH	The month (1–12)
DOW	The day of the week (0–7) where 0 and 7 are Sunday.

To specify multiple values for one field, the following operators are available. In the order of precedence,

* specifies all valid values
M-N specifies a range of values
M-N/X or */X steps by intervals of X through the specified range or whole valid range
A,B,...,Z enumerates multiple values

To allow periodically scheduled tasks to produce even load on the system, the
symbol H (for “hash”) should be used wherever possible. For example, using 0 0 * * *
for a dozen daily jobs will cause a large spike at midnight. In contrast,
using H H * * * would still execute each job once a day, but not all at the same
time, better using limited resources.

The H symbol can be used with a range. For example, H H(0-7) * * * means some
time between 12:00 AM (midnight) to 7:59 AM. You can also use step intervals
with H, with or without ranges.

The H symbol can be thought of as a random value over a range, but it actually
is a hash of the job name, not a random function, so that the value remains
stable for any given project.

Beware that for the day of month field, short cycles such as */3 or H/3 will not
work consistently near the end of most months, due to variable month lengths.
For example, */3 will run on the 1st, 4th, …31st days of a long month, then
again the next day of the next month. Hashes are always chosen in the 1-28
range, so H/3 will produce a gap between runs of between 3 and 6 days at the end
of a month. (Longer cycles will also have inconsistent lengths but the effect
may be relatively less noticeable.)

Empty lines and lines that start with # will be ignored as comments.

In addition, @yearly, @annually, @monthly, @weekly, @daily, @midnight, and
@hourly are supported as convenient aliases. These use the hash system for
automatic balancing. For example, @hourly is the same as H * * * * and could
mean at any time during the hour. @midnight actually means some time between
12:00 AM and 2:59 AM.

Examples:

# every fifteen minutes (perhaps at :07, :22, :37, :52)
H/15 * * * *
# every ten minutes in the first half of every hour (three times, perhaps at :04, :14, :24)
H(0-29)/10 * * * *
# once every two hours at 45 minutes past the hour starting at 9:45 AM and finishing at 3:45 PM every weekday.
45 9-16/2 * * 1-5
# once in every two hours slot between 9 AM and 5 PM every weekday (perhaps at 10:38 AM, 12:38 PM, 2:38 PM, 4:38 PM)
H H(9-16)/2 * * 1-5
# once a day on the 1st and 15th of every month except December
H H 1,15 1-11 *

To be concrete, here is the proposed new API.

// ParseWithHash is like Parse but additionally supports the symbol H in place
// of the minute, hour, day of month, month, or day of week field. The H symbol
// requests a random value (within the valid range) for each instance of H in
// the cron expression fixed using the given seed.
//
// For example, the schedule
//
//	H H * * *
//
// is a schedule that fires once per day at a random hour and minute that is
// chosen when the schedule is parsed. Given the same input expression and seed,
// the same schedule is generated.
//
// The range for randomly generated day of month values is [1, 28].
//
// Additionally, ParseWithHash interprets the named schedules differently from
// Parse:
//
//   - "@monthly" means "H H H * *"
//   - "@weekly" means "H H * * H"
//   - "@daily" means "H H * * *"
//   - "@hourly" means "H * * * *"
//
// The idea of the H symbol is borrowed from Jenkins, though the details are a
// bit different.
func ParseWithHash(expr string, seed uint64) (*Schedule, error)

Assorted notes:

  • Things like H/5 * * * * should work
  • We'll skip implementing H with ranges (e.g., H(0-7)) that Jenkins supports for now, for simplicity. We can always choose to add it later.
  • Parse should refuse to parse schedules with H (but should give a helpful error pointing out that only ParseWithHash handles such schedules).

This SGTM. The changes to my proposal - make Parse reject H schedules with a useful message and make ParseWithHash take a uint64 instead of a []byte - are both improvements. (My choice of []byte was the part I was most uncertain about in the original proposal.)