Add Experimental Support for Typed Macros
Opened this issue · 6 comments
Describe the feature
We would like to improve the experience of end users and developers as they create and use macros by supporting basic type annotations. The goal of the present issue is to begin exploring the design space in collaboration with dbt users by beginning with the simplest set of annotations and enforcements that is likely to be useful, and exposing it behind an experimental flag.
Acceptance Criteria
Following Python, the following syntax should be supported:
{% macro cents_to_dollars(column_name: str, scale: int = 2) %}
({{ column_name }} / 100)::numeric(16, {{ scale }})
{% endmacro %}
The set of available types will be defined by the following grammar:
<T> :=
str
bool
int
float
Any
List[<T>]
Dict[<T>, <T>]
Optional[<T>]
For this initial implementation, we will consider all types built-in. There will be no need to import type definitions and no user-defined types. Those are promising next steps.
We do not need to implement any type inference for this issue. Typing violations should be detected when the arguments to a typed macro are literals, but need not be detected for arguments passed via variables. Still, any progress in that direction would be nice, as it will be necessary for the long term utility of the type system.
Following Python, type annotations will not affect runtime behavior, as long as they are syntactically correct.
Typing will not be enforced, but warnings will be issued at parse time.
Will typing for return values be supported as part of this initial leg of work?
@MichelleArk No, and there are two reasons:
- I haven't yet found a great way to implement the -> syntax without getting pretty hacky with jinja.
- I've been involved in some contentious discussions about exactly when/how we should be supporting return types, and I want to settle those issues first.
Everything spec'd above is very simple to implement, though, so I anticipate we'll be moving on to more advanced features quickly.
I think this is a bad idea not the best possible idea, and here's why: while obviously the option to check types is better than no option to check types, this obviously costs developer effort, and must be evaluated w/r/t the opportunity cost of other features which said developers might implement instead, including different features intended to achieve the same "As a developer I would like..." outcomes.
The first issue is that my editor already doesn't understand SQL-but-with-Jinja-macros. Maybe I can fix this, but making my editor understand SQL-but-with-typed-Jinja-macros will be harder, especially if "understand" is interpreted to mean "do something genuinely useful" rather than just "highlight the syntax correctly."
The second issue is that humans don't already understand SQL-but-with-typed-Jinja-macros. So now if I need to ask some people in a Jinja-but-not-dbt forum how to do X, I cannot simply copypasta my example to them, since it will contain type annotations that will confuse them & need to be explained. Perhaps eventually these annotations will become widely understood, but they won't be to begin with. So this will slow me down.
Now consider an alternate approach: when some system like SqlAlchemy wishes to interact with SQL, it doesn't do so directly, but via a query-builder. I might write something resembling:
query = SomeModel.where(some_condition)
query = query.join(SomeOtherModel, join_condition)
query = query.project(col0, col1, col2)
query.execute()
And the SQL will be generated as needed.
Now, even though this is Python code which generates SQL, it's still also Python code. Now consider:
-
Does my editor already understand Python code? Python types? How to use a Python LSP server? ETC?
-
Do other humans already understand Python code & types?
-
Does Python have any existing facilities which solve the same problems which Jinja macros are intended to solve?
The answers to these are: "Yes," "Yes," and "Yes" respectively. Some examples of existing Python facilities include methods, variables, objects, and modules.
One way to implement this approach would be to add support for Python code in addition to SQL in a way vaguely analagous to Webpack JS configuration: the code must define a module which contains an attribute called "final" which is a closure that evaluates to SQL. Other than that, it can do whatever you want, like export other attributes to be imported and used by other code, or provide a cli interface, or anything else. Obviously, other designs are available, especially if you think about this for more than the 1 or 2 minutes I did.
The downside of this is that it takes you further away from plain SQL. But given that you want to add static type checking, maybe that's what you want to do anyway whether you realize it or not.
The upside is that I think it will involve less work to implement. The PSF already produces a Python compiler and type-checking machinery. We just have to invoke it.
EDIT: "bad idea" is too strong. It's a good idea. It's just that maybe a better idea is possible.
@dradetsky Thanks so much for your thoughtful feedback!
The first issue is that my editor already doesn't understand SQL-but-with-Jinja-macros. Maybe I can fix this, but making my editor understand SQL-but-with-typed-Jinja-macros will be harder, especially if "understand" is interpreted to mean "do something genuinely useful" rather than just "highlight the syntax correctly."
This has been an active point of the discussion within the Core team. At a recent hackathon, my colleague @ChenyuLInx and I wrote a prototype LSP to demonstrate better syntax highlighting, code navigation, and macro type checking in VS Code when working with jinja-sql text. The code for that demo is not even close to production grade, and there are major pieces for us to get into place, but we are moving that direction. This issue is just one part of that larger effort.
I cannot simply copypasta my example to them
This is an interesting drawback I had not considered. The type annotations will always be optional, but your point is taken.
Now consider an alternate approach
I won't comment too much on the alternative approach you've mentioned, except to say that I think the ideas are basically sound. As my colleague @jeremyyeo pointed out in Slack, it would add an entirely new paradigm to dbt, and it is something we would need to give a lot of thought.
it would add an entirely new paradigm to dbt
This is true; I sometimes forget about the cost of "making the feature intuitive & accessible to users" & focus on "implementing the feature" which could obviously change the cost estimate considerably. On the other hand, "adding an entirely new paradigm to dbt" is another way of saying "it does more stuff now" so it's possible to approach this with too much trepidation.
except to say that I think the ideas are basically sound.
Please also feel free to call them "crazy", "stupid", or "definitely not on the roadmap." I won't be offended (most ideas aren't good, but I'm confident I'll come up with good ideas eventually). So if any of those are your reaction that's just useful feedback.