apache/incubator-baremaps

Workflow processing does not permit Postgres DO / Function blocks

polastre opened this issue · 4 comments

Consider a function like this, that converts all the columns to lowercase from a dataset:

DO $$
DECLARE row record;
BEGIN
  FOR row IN SELECT table_schema,table_name,column_name
             FROM information_schema.columns
             WHERE table_schema = 'public'
			 AND lower(column_name) != column_name
             AND table_name LIKE 'ne_%'
  LOOP
    EXECUTE format('ALTER TABLE %I.%I RENAME COLUMN %I TO %I',
      row.table_schema,row.table_name,row.column_name,lower(row.column_name));  
  END LOOP;
END $$;

If you copy and paste this into Postgres, it works just fine.

If you run it in a .sql file in your workflow.js, you get the error:

java.util.concurrent.ExecutionException: org.apache.baremaps.workflow.WorkflowException: org.apache.baremaps.workflow.WorkflowException: org.postgresql.util.PSQLException: Unterminated dollar quote started at position 4 in SQL 
DO $$
DECLARE row record. Expected terminating $$
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
        at org.apache.baremaps.cli.workflow.Execute.call(Execute.java:48)
        at org.apache.baremaps.cli.workflow.Execute.call(Execute.java:29)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.apache.baremaps.cli.Baremaps.main(Baremaps.java:62)

I believe this is again due to ; parsing by baremaps.

version: 0.7.2-rc1

Perhaps a different but bigger problem here is that v0.7.2 imports shapefiles with case-sensitive column names. So some tables in Natural Earth are imported with uppercase column names, which breaks all kinds of stuff. In v0.7.1, they were imported with all lowercase names. Note that not every Natural Earth table has uppercase names, it seems to be a bit of a mix. I'm not sure if that was intentional by the baremaps authors to preserve the case in column names, or a side effect of the latest changes.

Yes, this is probably due to the ; parsing you spotted earlier. By default, jdbc does not allow to execute sql script for security reasons. We should probably introduce a new task that overrides these defaults and allows to execute this kind a script.

The second problem is definetly a regression, sorry for that. For you, which behavior feels the most natural (lowercase, uppercase, case insensitive)? We can log another issue for this.

It is hard to say what's the right thing to do here. On one hand it is "accurate" to use the case-sensitive field names on importing shapefiles

On the other hand, shapefiles often use all uppercase which leads to really terrible query syntax in PostgreSQL.

My preference would be the old behavior. However, we have already implemented a workaround for this to post-process at a particular stage of the workflow to force the tables to lowercase column names.

Thanks for the feedback. Let's rollback the old behavior (I liked it as well). In the context of postgresql, it makes sense to have simple lower case table and column names.