thbar/kiba-common

Consider adding the row_pre_processor pattern to destinations

Closed this issue · 2 comments

I've attempted to build a wrapper class that uses this but I wonder if there is a cleaner way to do it, or whether it would be better to build that support directly into the Destinations::CSV class?

Very useful pattern though with the new StreamingRunner!

thbar commented

Glad you like the new StreamingRunner 😄

In my experience at this point, row preprocessors are mostly useful if you have more than one destination inside a given pipeline. Those situations are rare but can happen (e.g. https://github.com/thbar/kiba/wiki/Can-Kiba-handle-multiple-sources-and-destinations%3F#my-destinations-have-different-formats-how-do-i-handle-this).

A wrapper class is a good way to do this at the moment though. Here is a simple example (not tested, but so you get the idea):

class DestinationRowPreprocessor
  def initialize(pre_process:, destination:)
    @pre_process = pre_process
    @destination_args = destination
  end

  def destination
    @destination ||= begin
      klass = @destination_args.shift
      klass.new(*@destination_args)
    end
  end

  def write(row)
    # NOTE: you could handle N rows here too with a more complex pattern (Enumerator)
    row = pre_process(row)
    if row
      destination.write(row)
    end
  end

  def close
    @destination&.close
  end
end

Which you can use like this:

destination pre_process: -> (row) { ... }
                    config: [MyDestination, config: value]

Or even with a bit of DSLExtensions construct (see https://github.com/thbar/kiba-common/blob/master/lib/kiba-common/dsl_extensions/show_me.rb):

pre_process_destination -> (row) { ... },
  MyDestination, config: value

I'd be curious to hear more about your precise use-case, just to make sure I keep that in a corner of my mind as I work on future versions of kiba & kiba-common! (since this could somehow be baked in later in Kiba itself).

thbar commented

@pmackay I'll close for now since I think it's not generic enough to add the ability to each source destination, but please keep me posted about your precise use-cases here, I'd love to hear more!

I will also think about creating a official wrapper, which may be part of kiba-common maybe.