DHI/terracotta

TerracottaDriver.get_metadata assumes meta store is writeable when lazy-loading metadata

nickeopti opened this issue · 4 comments

When lazy-loading metadata, TerracottaDriver.get_metadata assumes that the meta_store has an implemented insert method:

metadata = self.meta_store.get_metadata(keys)

if metadata is None:
    # metadata is not computed yet, trigger lazy loading
    dataset = self.get_datasets(keys)
    if not dataset:
        raise exceptions.DatasetNotFoundError('No dataset found')

    path = squeeze(dataset.values())
    metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)
    self.insert(keys, path, metadata=metadata)

    # ensure standardized/consistent output (types and floating point precision)
    metadata = self.meta_store.get_metadata(keys)
    assert metadata is not None

return metadata

However, this behaviour fails when using RemoteSQLiteMetaStore, as that meta store does not implement insertions. As far as I can tell, this problem also existed before the SQLAlchemy refactor (note how insert is overwritten to not be implemented here https://github.com/DHI-GRAS/terracotta/blob/30730e78e204b573d1c3eb755a0107e3f73021f0/terracotta/drivers/sqlite_remote.py#L139, while get_metadata assumes insert works here https://github.com/DHI-GRAS/terracotta/blob/30730e78e204b573d1c3eb755a0107e3f73021f0/terracotta/drivers/sqlite.py#L330).

@kiksekage encountered this problem while developing a STAC driver, which will also not support insertions.
We see two possible solutions:

a) Wrap the insertion into a try-except block:

metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)

try:
    self.insert(keys, path, metadata=metadata)
    # ensure standardized/consistent output (types and floating point precision)
    metadata = self.meta_store.get_metadata(keys)
    assert metadata is not None
except NotImplementedError:
    pass

b) Introduce a PERSIST_LAZY_LOADED_METADATA setting (which defaults to True):

metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)

if self.PERSIST_LAZY_LOADED_METADATA:
    self.insert(keys, path, metadata=metadata)
    # ensure standardized/consistent output (types and floating point precision)
    metadata = self.meta_store.get_metadata(keys)
    assert metadata is not None

What do you think?

I never intended the remote driver to be usable with lazy loading. I guess we would ideally raise a better error though.

Not persisting lazily computed metadata is a recipe for terrible performance and a major footgun, so I'd advise against that. try-except is better but not ideal either since raising NotImplementedError is not part of the "official" API.

How about we introduce a class attribute MetaStore.writable that TerracottaDriver checks before requesting write operations? Technically you could also have non-writable MySQL databases (e.g. if the MySQL user doesn't have write permissions) and SQLite databases (no write permissions for file on disk). Then the appropriate exceptions could be raised in TerracottaDriver and RelationalDriver.

Theres a potential solution up on the issue225-lazy-writeable branch. Notably, the MetaStore class is set to be writable by default:
https://github.com/DHI-GRAS/terracotta/blob/a3aedcbfcd95aed279dd9754434b6409462b4bc1/terracotta/drivers/base_classes.py#L41

The added a very thin decorator to throw the new DatabaseNotWritable exception:
https://github.com/DHI-GRAS/terracotta/blob/a3aedcbfcd95aed279dd9754434b6409462b4bc1/terracotta/drivers/terracotta_driver.py#L23-L34

And then we access the flag directly when getting metadata from the meta store:
https://github.com/DHI-GRAS/terracotta/blob/22353ca9c3f99dbe4cdb70b7497d93162827713a/terracotta/drivers/terracotta_driver.py#L187-L194

Me and @nickeopti were discussing whether or not it was cleanest to handle the database "writability" in TerracottaDriver or all meta stores individually.

Does this look correct to you @dionhaefner and/or @mrpgraae? If yes, then ill write some tests.

Please open a PR so we can review these changes in more detail.

True! Up at #256 👍