exAspArk/batch-loader

Special array response syntax?

bessey opened this issue ยท 12 comments

Hi, firstly great library, love every bit of the implementation! Was fascinating learning how batch-loader worked.

I have what I can only imagine to be a common use-case for BatchLoader; batching N+1 SQL 1:Many calls. All the examples in the docs appear to be Many:1, or at least X:1, and so they are resolving to single items. Let me explain in code, as all that is probably just confusing things.

# Lets say I have a house, and I want all the photos for that house.

# This doesn't work, because it resolves a single photo to a single house
BatchLoader.for(house.id).batch do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id, photo)
  end
end

# This doesn't work, because if a house has no photos, nil is returned instead of [].
# Also for such a common pattern, this seems like a lot of boilerplate
BatchLoader.for(house.id).batch do |house_ids, loader|
  Photo.where(owner_id: house_ids)
    .group_by(&:owner_id).each do |house_id, photos|
      loader.call(house_id, photos)
    end
end


# This would be my dream API... Note batch takes a default resolve value
BatchLoader.for(house.id).batch([]) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    # Here .add is used instead of .call to let BatchLoader know not to replace item
    # but to call
    #   @item = (@item || @default) << item
    loader.add(photo.owner_id, photo)
  end
end
# Alternative API... explicit batch_array means loader API can stay the same, no default needed
BatchLoader.for(house.id).batch_array do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id, photo)
  end
end

@bessey your ideas are brilliant! ๐Ÿ’ฏ ๐Ÿ‘

Currently I'm doing something like:

BatchLoader.for(house.id).batch do |house_ids, loader|
  photos = Photo.where(owner_id: house_ids)
  # a hash with an empty array as a default value
  photos_by_house_id = photos.each_with_object(Hash.new { |h, k| h[k] = [] }) do |photo, memo|
    memo[photo.owner_id] << photo
  end

  house_ids.each { |houst_id| loader.call(house_id, photos_by_house_id[house_id]) }
end

But that's a lot of code and it can be definitely simplified with BatchLoader!

There are, however, may be the cases when users would like to load not an array but a hash. For example:

loader.call(
  house_id1,
  {
    photo1.id => photo1,
    photo2.id => photo2,
    photo3.id => photo3
  }
)

Do you have any ideas of an API which will allow building custom objects (arrays, hashes, etc.) for M:N associations?

For example:

Photo.where(owner_id: house_ids).each do |photo|
  loader.call_with_object(photo.owner_id, []) { |memo| memo << photo }
end
# [
#   photo1,
#   photo2,
#   photo3
# ]

Photo.where(owner_id: house_ids).each do |photo|
  loader.call_with_object(photo.owner_id, {}) { |memo| memo[photo.id] = photo }
end
# {
#   photo1.id => photo1,
#   photo2.id => photo2,
#   photo3.id => photo3
# }

However, a lot of people probably need just an array or they can transform it before / later manually if they want.

This would be my dream API... Note batch takes a default resolve value

I like the idea!

# "default_value" can be a keyword argument. BatchLoader#batch already accepts "cache" option
BatchLoader.for(house.id).batch(default_value: []) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    # the method can be called "add"
    # or "push" โ€“ similarly to Array#push
    # or "append" โ€“ explicitly says that the value will be added at the end of the array
    # which one would you prefer?
    loader.append(photo.owner_id, photo)
  end
end

Please let me know what do you think about it ^

Definitely more flexible, but still a little too much boilerplate for me to call it perfect!

I wonder if one could get the best of both worlds (flexibility, LOC at point of use) by moving this config into the BatchLoader class, essentially making this loader definition configurable

# existing implementation + new unused memo arg which gives item's CURRENT value
class BatchLoader
  def initial_value
    nil
  end
  def loader
    -> (item, value, memo) { __executor_proxy.load(item: item, value: value) }
  end
end

class ArrayBatchLoader < BatchLoader
  # I'm using a block to work ensure we don't share mutable container across batches
  def initial_value
    []
  end
  def loader
    loader -> (item, value, memo) {
      new_value = memo.concat([value])
      # now that this is public API, would probably want to give it a nicer name than __executor_proxy
      __executor_proxy.load(item: item, value: memo_value)
    }
  end
end

I imagine you could and would predefine ArrayBatchLoader and HashBatchLoader, as this API isn't that great for an end user to wrap their head around.

I accidentally posted my last comment while typing ๐Ÿ˜‚
Updated it by adding a code example with a few suggestions

To clarify, in that example memo would basically be:

def memo
  if value_loaded?(item: item)
    loaded_value(item: item)
  else 
    initial_value
  end
end

I wonder if one could get the best of both worlds (flexibility, LOC at point of use) by moving this config into the BatchLoader class

๐Ÿ‘ got the idea. I personally don't want to force users to define custom BatchLoader classes based on their needs. I think that BatchLoader can provide a clear API which will be enough to use it as it is, at least for now.

I like your initial idea. Just a few suggestions to discuss:

  • Add default_value as a keyword argument: batch(default_value: [])
  • Think about the method name on the loader: add, push or append. Internally, it'll require to start using a special class like BatchLoader::Loader for a loader variable instead of lambdas.

Seems like you already took a look at the source code. If we agreed on the API and you'd like to contribute, feel free to open a PR. Even failing tests will be enough to start :)

Happy to contribute once we've agreed on at least some portion of the API. I can't personally say i have a usecase for the Hash syntax so I'm struggling to design an API for that, but maybe we can start with the Array option. What do you think of this?

BatchLoader.for(house.id).batch(default_value: []) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id, photo)
  end
end

Essentially I'm thinking you just add default_value and change the implementation of call to something like:

item ||= default_value # default nil
if item.respond_to?(:<<)
  item << value
else 
  item = value
end

This absolutely doesn't handle the use case of Hash though, sorry!

Hey @bessey!

I like your initial idea. I think that using plain arrays is a much more common use case compared to hashes or other structures. People can always do the data transformations manually and just execute loader.call.

However, it'll be great if we can add a support to make these custom data transformations easier, too. What do you think about this API:

  • Load many-to-many associations:
BatchLoader.for(house.id).batch(default_value: []) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id) { |value| value << photo }
  end
end
# => [photo1, photo2]
  • Custom hash object:
BatchLoader.for(house.id).batch(default_value: {}) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id) { |value| value[photo.id] = photo }
  end
end
# => { photo1.id => photo1, photo2.id => photo2 }

I think it's a quite simple and consistent API but probably doesn't cover all use cases. Fo example, immutable data structures. In this case, we can require returning a result object from the block (similarity to Array#inject), not sure:

  • Loading a single object at once
BatchLoader.for(house.id).batch(default_value: []) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id) { photo } # same as loader.call(photo.owner_id, photo)
  end
end
# => photo2
BatchLoader.for(house.id).batch(default_value: Hamster::Vector[]) do |house_ids, loader|
  Photo.where(owner_id: house_ids).each do |photo|
    loader.call(photo.owner_id) { |vector| vector.add(photo) } # returns a new object w/o mutating it
  end
end
# => Hamster::Vector[photo1, photo2]

Please let me know what do you think about it.

I implemented the array parts, but hit a bug i don't understand around the .call {...} syntax. Need your help there, I'm guessing its something to do with passing the __executor_proxy by value instead of reference as it was in the lambda.

Check out my callable branch, and see there's one failing test for nested batch loaders.

@bessey Looks great! ๐Ÿ˜

I think that the test with the nested batch loaders fails because of the line:

raise ArgumentError, "Please pass a value or a block" if value == NULL_VALUE

The value in this case is a nested BatchLoader instance. Executing == forces loading the value (not lazily). I think you can remove the check for now.

Additionally, if you want, you can move the logic from the ExecutorCallable#call method to the existing loader lambda since we don't introduce new methods and reuse the same call. It's up to you :)

By the way, nice error handling!

Added in #8.