DHI/terracotta

Ensure that ingest is using sqlite

bartwalczak opened this issue · 6 comments

Parsing:

con_params = urlparse.urlparse("postgresql://terracotta:pwd@db-data/terracotta")
print(con_params)

Yields:

ParseResult(scheme='postgresql', netloc='terracotta:pwd@db-data', path='/terracotta', params='', query='', fragment='')

No username, no database, etc.

Therefore the terracotta driver initialization code fails with postgres scheme.

I'm probably wrong in the above, but I still keep getting the 'invalid database path' exception with a seemingly valid postgres string (and the database is available there, I can use psql to connect to it from that container):

terracotta ingest /data/{emission}_{sector}_{year}.tif -o "postgresql://terracotta:pwd@db-data/terracotta"
 [-] Uncaught exception!
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/terracotta/scripts/cli.py", line 52, in entrypoint
    cli(obj={})
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/terracotta/scripts/ingest.py", line 70, in ingest
    driver = get_driver(output_file)
  File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/__init__.py", line 106, in get_driver
    meta_store=DriverClass(url_or_path),
  File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/postgresql_meta_store.py", line 54, in __init__
    raise ValueError('invalid database path')
ValueError: invalid database path

The relevant code path is this:

if '/' in self.url.database.strip('/'):
    raise ValueError('invalid database path')

I.e., you are expected to only provide a database name with no additional path components (we really need to improve this error message).

So this should work:

terracotta ingest /data/{emission}_{sector}_{year}.tif -o "postgresql://terracotta:pwd@db-data"

But then there is no database name in the url?

And shouldn't the slash be trimmed anyway by this database=con_params.path[1:], # remove leading '/' from urlparse:

        url = URL.create(
            drivername=f'{cls.SQL_DIALECT}+{cls.SQL_DRIVER}',
            username=con_params.username or settings.SQL_USER,
            password=con_params.password or settings.SQL_PASSWORD,
            host=con_params.hostname,
            port=con_params.port,
            database=con_params.path[1:],  # remove leading '/' from urlparse
            query=dict(urlparse.parse_qsl(con_params.query))
        )

Yes, database name must be present:

#0 1.360 Traceback (most recent call last):
#0 1.360   File "/scripts/create.py", line 18, in <module>
#0 1.360     driver = terracotta.get_driver(db_name(), "postgresql")
#0 1.360   File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/__init__.py", line 106, in get_driver
#0 1.360     meta_store=DriverClass(url_or_path),
#0 1.360   File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/postgresql_meta_store.py", line 52, in __init__
#0 1.360     raise ValueError('database must be specified in PostgreSQL path')
#0 1.360 ValueError: database must be specified in PostgreSQL path

Okay, I get it now. Looks like we both got confused.

The thing is that terracotta ingest only supports SQLite databases and therefore expects a path on your local system, not a database URL. So the URL you pass is interpreted as a file path and gets normalized to postgresql:/terracotta:pwd@db-data.com/terracotta, which then triggers this exception.

The solution is to use the Python API for ingestion (see our docs).

This restriction should clearly be documented better, apologies.

Thanks. I am going to close this, but I need to log two other issues (I will create them separately with more details):

  • postgres driver ignores supplied database name
  • postgres driver.create fails when database exists

Thanks a lot for your prompt responses and all your work.