riverqueue/river

InsertOpts() Metadata

atoi opened this issue · 5 comments

atoi commented

It seems that Metadata is ignored when provided as a part JobArgsWithInsertOpts.InsertOpts().
When specified as a part of third param to Client.Insert they are applied just fine.

The problem I see is that insertParamsFromArgsAndOptions neglects JobArgsWithInsertOpts meta and always takes what was passed as a function arg. Actually I do agree that the function arg should have higher priority and override what was specified in JobArgsWithInsertOpts.InsertOpts(), but when the arg is empty I'd expect job's meta to take effect.

Important pieces below:

if insertOpts == nil {
	insertOpts = &InsertOpts{}
}

var jobInsertOpts InsertOpts
if argsWithOpts, ok := args.(JobArgsWithInsertOpts); ok {
	jobInsertOpts = argsWithOpts.InsertOpts()
}
metadata := insertOpts.Metadata
if len(metadata) == 0 {
	metadata = []byte("{}")
}

Hi @atoi, this is actually intentional, although it should have been documented. I pushed up a commit that demonstrates this behavior and documents it here.

I'm wondering what use case you have in mind though. My initial thought was metadata would always be set at insertion time, but it could be possible to set some of that at the worker level. At most we could maybe change the behavior so that the default metadata specified in the JobArgsWithInsertOpts gets merged with / overridden by the metadata from the insert opts, rather than fully overwritten by it. So if your JobArgsWithInsertOpts had metadata {"a": "foo", "b": "bar"} and the Insert() level opts specified {"b": "override, "c": "something_else"}, the resulting metadata would be {"a": "foo", "b": "override", "c": "something_else"}. We might even be able to do that without pulling in an extra dependency like gjson because Postgres itself could do the merge.

I would like to hear more about the intended use case here though to know whether or not this makes sense. Please tell me more about how you'd like to use it 😄

atoi commented

Hi @bgentry!

My usecase is pretty simple. I need a place to store some job-specific data.
I cannot put that data into args, as that will interfere with job uniqueness check.
As the data is job-specific, I first tried to keep it close to my job definion and used JobArgsWithInsertOpts. I just naively assumed I can use all the options InsertOpts struct offers. After realizing that some options are more equal than others I was able to do what I wanted by utilizing an arg to Insert() function.

I suppose such a behavior should be mentioned in the documentation.

Ah, I’m glad I asked because this jogged my memory on something we are missing here. It would be a bit of a misuse of metadata it was used for job-specific data merely to work around the uniqueness check.

What if instead of that, we made the uniqueness check configurable so that you could choose which specific top level keys in the args field you wanted to be considered for the uniqueness check? Would that satisfy your use case by allowing you to choose which part of your args to factor in and excluding the rest? This is something we had always planned to do with unique jobs but haven’t gotten around to yet.

You are right to point out the unequal behavior of metadata in the insertopts here. That field was only exposed recently and I overlooked the fact that the behavior is different here compared to any of the other options. That being said I doing think that making it’s behavior the same (where an insertion time value completely overrides a job level default) would actually make much sense here either as opposed to something like merging the attributes—you simply wouldn’t be able to do much with that while also using any of the features which will use metadata.

In the end I think we will either want to go with the merging behavior and document it appropriately, or fork the type for default insert opts so that it doesn’t have this option which doesn’t make sense to use in this way.

Tagging in @brandur to get his thoughts as well.

atoi commented

TLDR: Implementing metadata merge will make me happy.

Hmm. Yes, technically the configurable uniqueness check would solve my case. Conceptually though, I'm not so sure.

See, I look at args as "task function parameters" or "things a job needs to get started".
That extra info I'm storing is related to how the job is processed, not started. I store job progress plus some data samples that job collects. I do not need to store that data permanently in "real" tables, as it is needed purely for better user experience and does not hold business value. So I decided to put that data closer to the job that collects it and metadata seemed to be a nice place for it.

I've been told already that metadata is reserved for future usage and it is better not to mess with it. At the same time it should be fine if I "reserve" a conflict-resistant namespace there for my app purposes.

Spoke to @bgentry offline about this one, but broad thoughts:

  • Uniqueness based on configurable specific args keys is probably something we should do, although even if it solves the proximate problem here, I'd expect this metadata issue to come up again. Metadata issues always come up again, heh.
  • I think merging metadata from InsertOpts rather than a full blown overwrite makes sense, although I suspect that we may want to eventually make this configurable in some way. i.e. InsertOpts can pass a flag or boolean indicating whether its metadata should merge with metadata below, or full on replace it. Based on my experience with metadata in past jobs, the appetite for doing creative things with metadata hashes knows no bounds and will eventually grow to desire all possible edge cases to be covered.