TerracottaDriver.get_metadata assumes meta store is writeable when lazy-loading metadata
nickeopti opened this issue · 4 comments
When lazy-loading metadata, TerracottaDriver.get_metadata
assumes that the meta_store
has an implemented insert
method:
metadata = self.meta_store.get_metadata(keys)
if metadata is None:
# metadata is not computed yet, trigger lazy loading
dataset = self.get_datasets(keys)
if not dataset:
raise exceptions.DatasetNotFoundError('No dataset found')
path = squeeze(dataset.values())
metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)
self.insert(keys, path, metadata=metadata)
# ensure standardized/consistent output (types and floating point precision)
metadata = self.meta_store.get_metadata(keys)
assert metadata is not None
return metadata
However, this behaviour fails when using RemoteSQLiteMetaStore
, as that meta store does not implement insertions. As far as I can tell, this problem also existed before the SQLAlchemy refactor (note how insert
is overwritten to not be implemented here https://github.com/DHI-GRAS/terracotta/blob/30730e78e204b573d1c3eb755a0107e3f73021f0/terracotta/drivers/sqlite_remote.py#L139, while get_metadata
assumes insert
works here https://github.com/DHI-GRAS/terracotta/blob/30730e78e204b573d1c3eb755a0107e3f73021f0/terracotta/drivers/sqlite.py#L330).
@kiksekage encountered this problem while developing a STAC driver, which will also not support insertions.
We see two possible solutions:
a) Wrap the insertion into a try-except
block:
metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)
try:
self.insert(keys, path, metadata=metadata)
# ensure standardized/consistent output (types and floating point precision)
metadata = self.meta_store.get_metadata(keys)
assert metadata is not None
except NotImplementedError:
pass
b) Introduce a PERSIST_LAZY_LOADED_METADATA
setting (which defaults to True
):
metadata = self.compute_metadata(path, max_shape=self.LAZY_LOADING_MAX_SHAPE)
if self.PERSIST_LAZY_LOADED_METADATA:
self.insert(keys, path, metadata=metadata)
# ensure standardized/consistent output (types and floating point precision)
metadata = self.meta_store.get_metadata(keys)
assert metadata is not None
What do you think?
I never intended the remote driver to be usable with lazy loading. I guess we would ideally raise a better error though.
Not persisting lazily computed metadata is a recipe for terrible performance and a major footgun, so I'd advise against that. try-except
is better but not ideal either since raising NotImplementedError
is not part of the "official" API.
How about we introduce a class attribute MetaStore.writable
that TerracottaDriver
checks before requesting write operations? Technically you could also have non-writable MySQL databases (e.g. if the MySQL user doesn't have write permissions) and SQLite databases (no write permissions for file on disk). Then the appropriate exceptions could be raised in TerracottaDriver
and RelationalDriver
.
Theres a potential solution up on the issue225-lazy-writeable
branch. Notably, the MetaStore
class is set to be writable by default:
https://github.com/DHI-GRAS/terracotta/blob/a3aedcbfcd95aed279dd9754434b6409462b4bc1/terracotta/drivers/base_classes.py#L41
The added a very thin decorator to throw the new DatabaseNotWritable
exception:
https://github.com/DHI-GRAS/terracotta/blob/a3aedcbfcd95aed279dd9754434b6409462b4bc1/terracotta/drivers/terracotta_driver.py#L23-L34
And then we access the flag directly when getting metadata from the meta store:
https://github.com/DHI-GRAS/terracotta/blob/22353ca9c3f99dbe4cdb70b7497d93162827713a/terracotta/drivers/terracotta_driver.py#L187-L194
Me and @nickeopti were discussing whether or not it was cleanest to handle the database "writability" in TerracottaDriver
or all meta stores individually.
Does this look correct to you @dionhaefner and/or @mrpgraae? If yes, then ill write some tests.
Please open a PR so we can review these changes in more detail.