flux-framework/flux-coral2

Limits on rabbit resources

Opened this issue · 1 comments

There is currently no way to enforce limits on how many rabbit resources users can request, which is a problem.

PR #202 adds a rudimentary way to enforce a limit on the amount of storage per node for any given job, by adding an exception while the job is in DEPEND state. However, there is no tracking across jobs. Also, ideally jobs would be rejected at submission time, before a jobid is issued, rather than in DEPEND.

@cmoussa1 just a heads up that this might take some flux-accounting work as well? Unclear.

OK sounds good, let me know what you might require!

Just brainstorming, if your jobtap plugin had access to (at the bare minimum) flux-accounting association data (like all of the associations present on a cluster), you could probably keep track of storage usage per-association within the plugin itself (this is what the priority plugin does to enforce running jobs limits), maybe in a map or something? This way you wouldn't need the priority plugin to be running or anything.

flux-accounting has an export-db command that will fetch basic association data from the DB and port it to a .csv file that might be able to be utilized when populating this internal map within the plugin.

If different associations need to have different limits (i.e some users need to have way more storage allocation than others), then perhaps flux-accounting needs to play more of a role here. But if pretty much every association can have the same limit, then that might save some additional integration.