API changes: extend visitor with return values
wumpz opened this issue · 25 comments
If you try to build a CopyVisitor of some kind of object hierarchy, one has to build some kind of Stack structure, to give the actual visitor a hint, where it should write its copied data.
I suggest to extend all visitor methods with a return value:
MyType visitor(MyType param);
The standard implementation should return the actual object inserted.
public interface StatementVisitor {
Comment visit(Comment comment);
Commit visit(Commit commit);
Delete visit(Delete delete);
Update visit(Update update);
Using this we would be able to create some kind of functional visitor implementation.
There is a thumbs up above, but I am more interested in textual comments :).
I'm having a bit of a hard time sort of visualizing this, so correct me if I wrong.
Let's assume we just had a SelectVisitor
to copy the Select
.
Old Way:
class OldSelect implements SelectVisitor {
private Select copy = new Select();
// super naive and bad just illustrating examples
Select copy(Select original) {
visit(original);
return copy;
}
@Override
public void visit(Select select) {
copy.setBody(deepClone(select.getSelectBody());
// potentially the with items would be
copy.setWithItems(deepClone(select.getWithItems()));
}
}
With new pattern:
class NewCopySelect implements SelectVisitor {
// super naive and bad just illustrating examples
Select copy(Select original) {
return visit(original);
}
@Override
public Select visit(Select select) {
// and other copying stuff
Select copy = new Select();
copy.setBody(deepClone(select.getSelectBody());
// potentially the with items would be
copy.setWithItems(visit(select.getWithItems());
return copy;
}
}
Or am I Missing something.
Correct. With this new style the visitors itself return the copy and the caller could immediately put it in the right place. So you don't need to organise your copied items in some sort of stack to find the last generated. Imho a global variable here should be avoided but cannot within the actual implementation.
extend all visitor methods with a return value
Would it be valuable to have CopyStatementVisitor
and the current StatementVisitor
? If not, then changing StatementVisitor would be fine, probably just a big warning to all consumers that things no longer return void.
Indeed this would be a massive api change. That is why we would go for version 4.
If there are two visitor interfaces then we would need to visit methods per class. So that is IMHO a no go.
IIUC, the suggested change is that all the visit
methods of the visitors would be extended to return a copy of the object they receive as the argument to help with implementing the copy visitor. Is that correct?
If it is then, is my understanding that this is useful only to those who want a deep copy of an object, correct? I'm just wondering whether this huge API change is worth the migration burden on the users who want to move to version 4 of the library? That's because they would have to change every extension of every visitor they would have implemented as a consequence of this change. That sounds like a lot of work and might not be a good thing to bring upon the users. Just my opinion.
@sivaraam: that is correct. As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this. The problem here to solve IMHO the backreference to the parent object. How to replace an expression, while there is no reference to the parent? Now you have to adapt every possible parent. To make things worse you not only need the reference to the parent but to the attribute as well.
As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this.
Just for the sake of discussion, is it not possible to overcome this by making the visit methods return the appropriate super-classes? For example, in the case you mentioned we could just make the visit
method corresponding to the Column
of the appropriate visitor to return a generic Expression
object and the implementation of the visit
method could think about what type of Expression
should be returned. I think this would work for any given visitor, and the objects it is visiting. Of course, this would possibly make it difficult to comprehend why the visit
methods return a different type. But otherwise, does this sound like plausible solution?
That said, I still believe that modifying the existing visitor methods would place a huge migration burden on the users. To overcome this I think we can re-consider the following which was suggested before:
Would it be valuable to have CopyStatementVisitor and the current StatementVisitor ?
That would of course double the amount of visitor classes but at least it would not introduce any migration burden on the existing users. Also, having separate visitor classes would ensure that we don't add any additional (unrelated) responsibility to the visit
methods. IOW, it would ensure SRP.
So how to overcome this problems of the visitor pattern? Any ideas?
I agree with @sivaraam but would go one step further and make the extra visitor generic:
public <T> T accept(Generic*Visitor<T> visitor);
This would:
- Allow CopyVisitor to be written. Each visitor would be allowed to have its own type so you can have
implements GenericStatementVisitor<Statement>, GenericExpressionVisitor<Expression>
, etc. - Allow custom types to be returned too. e.g. the TableVisitor could have
T
set toList<TableName>
(this has the added benefit of easily making the class thread-safe because you no longer have a field lying around).
So how to overcome this problems of the visitor pattern? Any ideas?
Would it be possible to invert things for consumption sake. Right now we have
interface SimpleVIsitor {
void visitFoo(Foo foo);
}
And the proposed change is:
interface SimpleVisitor {
Foo visitFoo(Foo foo);
}
What if we introduced the CopyVisitor as a lower level stack, so we could have:
interface CopyableVisitor {
Comment copy(Comment comment);
}
interface SelectVisitor extends CopyableVisitor {
default void visit(Comment comment) { copy(comment); }
}
Since this is java 8, we could use the default
feature to make this at least easy for consumers?
@AnEmortalKid That would be a possibility. However, the difficulties are still there. Let us stick with our example of copying the hierarchy, then die proposed changes would do their job via something like
Expression visit(MyExpression expr) {
MyExpression my = new MyExpression();
my.setLeftExpression(visit(expr.getLeftExpression()));
return my;
}
Unfortunately, this misses one level of indirection from the visitor pattern, it should be
my.setLeftExpression( expr.getLeftExpression().accept(this) );
Not only the visit methods but the accept methods have to be changed as well.
If we want to create a different type of visitor, like replace all columns with numbers. A really stupid one, but you get the idea of replacing stuff. Then you have to make the return type "nearly everywhere" an Expression.
Additionally, a big problem with this actual visitor stuff is the missing context. If the visit Column method would know, where it came from (BinaryExpression, left) we could take advantage of this as well and so on and so far.
At the moment I think every visitor - API change would go in the wrong direction, because every task needs other specialities. So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.
So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.
Ah, so something we could push down information on like:
this.setContext(TABLE.X)
column.accept(this);
What sort of things would we need in the context?
Right. One could build some kind of proxy for the visitor, that could keep track of the previous or parent object. So our replacement would be simple.
To add to what @SerialVelocity suggested:
I really like the Presto parser API. For example, I have some code where I am converting from SQL to a datalog dialect, and this is one example of how I use the Presto API to do so.
/*
* Translates literals into corresponding DDlogRecord instances
*/
private static class ParseLiterals extends AstVisitor<DDlogRecord, Boolean> {
@Override
protected DDlogRecord visitStringLiteral(final StringLiteral node, final Boolean isNullable) {
try {
return maybeOption(isNullable, new DDlogRecord(node.getValue()));
} catch (final DDlogException e) {
throw new RuntimeException(e);
}
}
@Override
protected DDlogRecord visitLongLiteral(final LongLiteral node, final Boolean isNullable) {
return maybeOption(isNullable, new DDlogRecord(node.getValue()));
}
@Override
protected DDlogRecord visitBooleanLiteral(final BooleanLiteral node, final Boolean isNullable) {
return maybeOption(isNullable, new DDlogRecord(node.getValue()));
}
}
And elsewhere, I can do this:
// later, in some method
DDLogRecord record = parseLiterals.process(literal, isNullable);
Note that the visit methods can receive a context and a return value, both of which are generic types parameterized by AstVisitor<X, Y>. And in the above case, the ParseLiterals class is purely stateless (so I can re-use a single instance for the lifetime of my program).
I've been trying out the JSQLParser API simply because Presto is an odd SQL dialect and doesn't support some statements I need. But writing something like the above becomes cumbersome. The return values and the context I pass through the visitors (generic parameters X and Y) now become fields.
Here's the AstVisitor implementation in Presto: https://github.com/prestodb/presto/blob/master/presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
The problem is described in this discussion. What type to return? Or do you want to multiply one visitor by adding different context types?
If I get your question correctly (correct me if I'm wrong please :) ), the signature I'm thinking of is:
public abstract class BaseVisitor<R, C>
{
public R process(Node node)
{
return process(node, null);
}
public R process(Node node, @Nullable C context)
{
return node.accept(this, context);
}
// visit methods for each SQL node...
public R visit(<SomeSqlType> node, C Context) {
....
}
}
And yes, that will result in each use case specifying return and context types as they need. Visitors that don't need to return anything or need any context can use Void
types for R
or C
.
Greetings All, apologies for showing up late.
In my opinion The current implementation has 3 major shortcomings:
-
When traversing the AST from the root down to the leaves the Result object (e. g. the
StringBuilder
) must be global and so we can only traverse serially. No parallelism was possible although it would make sense for largeUNION
statements orSubSelect
columns etc. -
When the
accept
methods are called, there is no way to know what called it and so there is no (elegant) way to get the Child-Parent relationship. -
And obviously, both
accept
andvisit
methods don't return anything but can only write into global objects.
These were major obstacles when writing JSQLFormatter but became showstopper for my latest project where I want to resolve Columns for AllColumns
and AllTablesColumns
Expressions (e. g. SELECT * FROM ...
).
Long story short, I have refactored all the Visitors making them returning a Generic and also accepting a Generic as additional argument:
public interface SelectItemVisitor<T> {
<S> T visit(SelectItem<? extends Expression> selectItem, S parameters);
}
public class SelectItemVisitorAdapter<T> implements SelectItemVisitor<T> {
@Override
public <S> T visit(SelectItem<? extends Expression> item, S parameters) {
return null;
}
}
public class SelectDeParser extends AbstractDeParser<PlainSelect>
implements SelectVisitor<StringBuilder>,
SelectItemVisitor<StringBuilder>, FromItemVisitor<StringBuilder>,
PivotVisitor<StringBuilder> {
@Override
public <S> StringBuilder visit(SelectItem<?> selectExpressionItem, S parameters) {
selectExpressionItem.getExpression().accept(expressionVisitor, parameters);
if (selectExpressionItem.getAlias() != null) {
buffer.append(selectExpressionItem.getAlias().toString());
}
return buffer;
}
}
Standard Types would be VOID
of course but you can write now your own Visitor returning a Generic (Leaf to Root) and also pushing a Generic down the AST (Root to Leaf) providing it as method parameter.
Please have a look at the branched implementation: https://github.com/JSQLParser/JSqlParser/tree/visitors_return_objects
Do let me know any questions, concerns or suggestions for improvements, thank you.
PS: Although this new implementation gave us the option to improve the DeParsers
by handing down the StringBuilder
and returning the StringBuilder
from each node, I did not touch this.
But in theory we could improve it this way and deparse in parallel (which may not be relevant for plain deparsing, but formatting, query transpilation/rewriting and SQL LSP may be interested in this option).
Any opinion please?
I have successfully used those new Visitors for my Column Resolution and have been effectively able to push scopes down from the root to the leafs, but also to pull the final column information from the leaves up to the root.
Although I had to amend all visit()
and accept()
methods in all my JSQLParser depending code.
If there were no objections I would merge those changes soon in order to get an "Api breaking" JSQLParser 5 ready?!
Final call for opinions please.
I have provided default
implementations for mimicking the simplified former calls, e. g.
public interface SelectItemVisitor<T> {
<S> T visit(SelectItem<? extends Expression> selectItem, S parameters);
default void visit(SelectItem<? extends Expression> selectItem) {
this.visit(selectItem, null);
}
}
When there were no objections, I would merge this code on weekend and aim for an "API breaking" release 5.0 next week.
Solved via 0e1ff56.
Would be released as API Breaking 5.0 with dependency on Java 11.
Documentation with examples is here: https://manticore-projects.com/JSQLParser/migration50.html