edgurgel/verk

Defining contextual data for worker processes

mskv opened this issue · 4 comments

mskv commented

I have a feature suggestion (or maybe it's already supported somehow?).

Maybe it would be easier to start with use cases:

  • The process that enqueues a Verk job has a correlation-id in its process dictionary. I use it to correlate logs. I would like to include this correlation-id in the serialized job and have the worker setup its process dictionary before performing the work. This way any logs would include the correlation-id.
  • Same applies to other contextual data that is often kept in process dictionary for convenience, for instance locale.
  • This could also be used for instance to include some caller-related data, like "caller-stacktrace" to see what piece of code enqueued the job.

For instance Sidekiq achieves this through "middlewares" - pluggable pieces of code that can run before equeueing, and around processing jobs:
https://github.com/mperham/sidekiq/wiki/Middleware

From what I undestand Verk only offers read-only access to job lifecycle through the Event Manager. But what I would need would be a way to plug some code into the worker process itself to have access to its process dictionary.

All of the above could be achieved currently by using Job's args. When enqueuing a job, I could include any metadata I need in the args and then every single perform callback in my job definitions would need to expect those metadata and handle them. This is not very convenient though for global configuration like the correlation-id inclusion.

What do you think about this? Or maybe am I missing something in the current implementation?

Hey @mskv,

I'm not 100% sure if I understood what you are looking for but we currently have the whole Job information available inside the Process dictionary:

def current_job, do: :erlang.get(@process_dict_key)

Would this help with your problem?

mskv commented

Thanks for the link. I don't think it helps in this case. Maybe I'll expand on the correlation-id example.

I have a web server. At the beginning of each request it generates a correlation-id. It adds it to Logger metadata. So every single log message generated when handling the request contains correlation-id. When the process handling the web request enqueues a Verk job, I would like to attach this correlation-id to the job. Then, when this job is picked up by the worker process, the correlation-id could get attached to Logger metadata. This way not only the whole request handling is correlated, but also everything happening in the background.

This is already possible - just attach it to the args of a job and manually handle it in perform callback:

Verk.enqueue(%Verk.Job{
  queue: :default, 
  class: "ExampleWorker", 
  args: [1,2, Logger.metadata]
)
defmodule ExampleWorker do
  def perform(arg1, arg2, logger_metadata) do
    Logger.metadata(logger_metadata)

    arg1 + arg2
  end
end

The only way not to do this for every single worker module would be metaprogramming.

I guess my question is whether I was missing something in this regard. I wanted to point to Sidekiq Middlewares as a streamlined example of how they handle this. But I understand it may be outside the scope of the project, since - as above - it's already possible to achieve this using existent tools.

@mskv ,

Yeah I think the best approach is to include this metadata as part of the arguments. Maybe always using the first argument as metadata as a hash so you can add whatever is useful to track this information?

mskv commented

Thanks, will go that way probably, closing the issue.