DHI/terracotta

Question: speed of compute metadata

Argunl opened this issue · 2 comments

Hi, I use terracotta to constantly add new files to an already existing database. The metadata calculation process takes most of the time. Is there any way I can speed it up?

with driver.connect():
  metadata = driver.compute_metadata(raster_filename)
  driver.insert(keys, raster_filename, override_path=url, metadata=metadata)

Yes, but there's no free lunch.

If you can accept inaccurate metadata, you can compute it based on a smaller version of the image by using the max_shape argument to compute_metadata:

with driver.connect():
  metadata = driver.compute_metadata(
      raster_filename, 
      max_shape=(1024, 1024),  # or any other value
  )
  driver.insert(keys, raster_filename, override_path=url, metadata=metadata)

Smaller values of max_shape are faster at the expense of accuracy (because Terracotta doesn't look at every pixel of the image anymore). However, this implies for example that you cannot rely on the following being true anymore:

driver.get_metadata(keys)["range"][1] == raster_data.max()  # False if max_shape < raster.shape!

If that trade-off is acceptable to you, go nuts.

I tried to use_chunks, but there was no effect. However, max_shape significantly accelerated the work. Thank you very much