Collections API: RasterLinesJoin improvements
Opened this issue · 0 comments
Spoke with @lossyrob a bit about how we might improve the performance of RasterLinesJoin and he pointed out two optimizations we could make:
Reducing the number of stream lines looped over per tile
Currently we loop over the whole set of MultiLines for each tile here: https://github.com/WikiWatershed/mmw-geoprocessing/blob/develop/api/src/main/scala/Geoprocessing.scala#L106
However, we could consider looping over only the subset of lines which actually intersect the tile. Depending on how many tiles are there for an AOI, this would reduce the number of times the lines
loop executes since we'd only be dealing with lines with actual values.
We'd have to check whether improvements here would be offset by, presumably, looping over the lines to do the intersection operation before that.
Using Lines rather than MultiLines
Currently we do some processing on the input to transform the input stream vectors into MultiLines
: https://github.com/WikiWatershed/mmw-geoprocessing/blob/develop/api/src/main/scala/Utils.scala#L120
However, apparently the MultiLines
are unspooled by GT into Lines
, so we could flatmap the stream vectors into a Seq[Line]
and then try using something like a forEachByLineString
method in the loop.