lelit/pglast

referenced_relations: incorrectly treating LATERAL-ly-joined aliases as real relations

Closed this issue · 4 comments

While referenced_relations I think does handle the available CTEs at any given point in the parsed query tree, I don't think it correctly handles aliases for lateral-ly joined tables: the aliases are treated as real relations. For example

from pglast.visitors import referenced_relations

print(referenced_relations('''
  SELECT
    *
  FROM (
    SELECT * FROM a_table
  ) an_alias
  CROSS JOIN LATERAL (
    SELECT * FROM an_alias
  )
'''))

outputs

{'an_alias', 'a_table'}

when I think it should output

{'a_table'}

Thanks for the report. I agree with you: I will try to fix the problem as time permits.

I think I need some help here, to understand corner cases... and I knew there would be some of them coming!

The following patch apparently fixes your example:

diff --git a/pglast/visitors.py b/pglast/visitors.py
index 1a00686..2e6c6db 100644
--- a/pglast/visitors.py
+++ b/pglast/visitors.py
@@ -402,10 +402,11 @@ class ReferencedRelations(Visitor):
     relations referenced by the given :class:`node <pglast.ast.Node>`.
     """
 
-    def __init__(self, cte_names=None, skip_with_clause=None):
+    def __init__(self, cte_names=None, skip_with_clause=None, aliases=None):
         super().__init__()
         self.cte_names = set() if cte_names is None else cte_names.copy()
         self.skip_with_clause = skip_with_clause
+        self.aliases = set() if aliases is None else aliases.copy()
         self.r_names = set()
 
     def __call__(self, node):
@@ -433,13 +434,16 @@ class ReferencedRelations(Visitor):
             if with_clause.recursive:
                 self.cte_names.update(maybe_double_quote_name(cte.ctename)
                                       for cte in with_clause.ctes)
-                self.r_names.update(ReferencedRelations(self.cte_names)(with_clause))
+                self.r_names.update(
+                    ReferencedRelations(self.cte_names, aliases=self.aliases)(with_clause))
             else:
                 for cte in with_clause.ctes:
                     cte_name = maybe_double_quote_name(cte.ctename)
-                    self.r_names.update(ReferencedRelations(self.cte_names)(cte))
+                    self.r_names.update(
+                        ReferencedRelations(self.cte_names, aliases=self.aliases)(cte))
                     self.cte_names.add(cte_name)
-            self.r_names.update(ReferencedRelations(self.cte_names, with_clause)(node))
+            self.r_names.update(
+                ReferencedRelations(self.cte_names, with_clause, self.aliases)(node))
             return Skip
 
     visit_UpdateStmt = visit_InsertStmt = visit_DeleteStmt = visit_SelectStmt
@@ -448,6 +452,10 @@ class ReferencedRelations(Visitor):
         if node is self.skip_with_clause:
             return Skip
 
+    def visit_RangeSubselect(self, ancestors, node):
+        if node.alias:
+            self.aliases.add(maybe_double_quote_name(node.alias.aliasname))
+
     def visit_RangeVar(self, ancestors, node):
         "Collect relation names, taking into account defined CTE names"
 
@@ -459,7 +467,7 @@ class ReferencedRelations(Visitor):
         if node.catalogname:
             tname = f'{maybe_double_quote_name(node.catalogname)}.{tname}'
 
-        if tname not in self.cte_names:
+        if tname not in self.cte_names and tname not in self.aliases:
             self.r_names.add(tname)
 

but then I wondered about what would/should happen with a statement where I "override" a concrete table name with an alias, such as

SELECT
    *
  FROM (
    SELECT * FROM a_table
  ) a_table
  CROSS JOIN LATERAL (
    SELECT * FROM a_table
  )

so I tried that in a psql session just to see whether that is considered valid, but even your initial query gives me an error:

$ psql foo
psql (16.2)
Type "help" for help.

foo=# create table a_table (a int, b text);
CREATE TABLE
foo=# SELECT
    *
  FROM (
    SELECT * FROM a_table
  ) an_alias
  CROSS JOIN LATERAL (
    SELECT * FROM an_alias
  );
ERROR:  relation "an_alias" does not exist
LINE 7:     SELECT * FROM an_alias
                          ^

Can you help me understand what I am doing wrong?

Oooh... so I misunderstood exactly what you're allowed to do when lateral joining. For each row on the left side of the lateral join, the right hand table gets evaluated. So it doesn't make sense to select "from" the left hand table: in the right you can only refer to the current row via its alias.

So in

SELECT
    *
FROM (
  SELECT * FROM a_table
) an_alias
CROSS JOIN LATERAL (
  SELECT * FROM an_alias
)

the final an_alias is searching the search path for a real relation. For a better "minimal" case that works:

SELECT
    *
FROM (
  SELECT * FROM a_table
) an_alias
CROSS JOIN LATERAL (
  SELECT an_alias.a
);

But then referenced_relations I think correctly returns only an_alias. So this issue might just be invalid?

Thank you