spark-root/laurelin

Make/use thin Branch/Basket proxies in executors

PerilousApricot opened this issue · 1 comments

Each partition currently loads the ROOT file and deserializes all the metadata from scratch, which is wasteful -- the driver has to load/parse the files anyway, so there's no sense in re-doing it over and over for each cluster in each file.

Instead, pass the necessary data that ArrayBuilder.GetBasket needs (path, offset, length, etc...) from the driver to the executors, so they can just open the file and seek directly to the basket byte ranges