lelit/pglast

To get list of columns with transformation formula

AzatYakupov opened this issue · 3 comments

Hi there!

I am using pglast version 3.3
Please hint me how I can get all included columns in all SelectStmt objects with probably existing formula .

Just example.

SELECT t1.name AS name ,
t1.name||' '||t2.name AS new_name
FROM t1 inner join t2 on t1.id = t2.par_id;

and need to get next columns:
name: t1.name
new_name: t1.name||' '||t2.name

thanks!

lelit commented

The following script does that:

from pglast.parser import parse_sql
from pglast.stream import RawStream
from pglast.visitors import Visitor


class TargetColumnNames(Visitor):
    def __call__(self, node):
        self.tcnames = {}
        super().__call__(node)
        return self.tcnames

    def visit_ResTarget(self, ancestors, node):
        self.tcnames[node.name] = RawStream()(node.val)


def target_columns(stmt):
    return TargetColumnNames()(parse_sql(stmt) if isinstance(stmt, str) else stmt)


print(target_columns("SELECT t1.name AS name, t1.name||' '||t2.name AS new_name"
                     " FROM t1 inner join t2 on t1.id = t2.par_id"))

As always, details matter, and your problem is to define more clear rules that describe what you mean with "list of columns": for example, what should happen if the statement has subselects?

Thanks @lelit , that's really cool! thanks!
Regarding your comment, yes you are right, need to get a ETL transformation for every column, and SQL can contain several ETLs for one column (like CTE , subqueries~subselects).
So, currently, I am going to work on your code to get a branch of transformation for one column

lelit commented

I think there's nothing more to be done here, right? Otherwise, feel free to reopen the issue!