bblfsh/bblfshd

java client can't parse code without a class?

alonsopg opened this issue · 6 comments

I am testing bblfsh for java and it can't parse java functions without being inside a class. For example:

When I do:

public static void main(String[] args) {
   // Prints "Hello, World" to the terminal window.
   System.out.println("Hello, World");
}

I get:

ResponseError: Syntax error on token(s), misplaced construct(s)
Syntax error on token "void", @ expected
Syntax error on token "]", :: expected after this token
Syntax error on token "}", delete this token

Is there any way of parse isolated pieces of java code?

Unfortunately I think the short answer is "no": The parser used by the Java driver expects the input file to conform to the Java grammar, which doesn't support standalone functions like this.

You may be able to work around this by constructing an input that wraps the snippet, for example:

package p;

class C {
   <snippet>
}

This would suffice for the example you described here—but of course might not work for other syntactic forms. And of course, the resulting tree would include the package clause and class definition too, and you would have to filter those back out.

Thanks for the quick response! I thought in the same solution. However, I am curious about which syntatic forms might not work, do you have any idea?

Thanks for the quick response! I thought in the same solution. However, I am curious about which syntatic forms might not work, do you have any idea?

Well, the only syntaxes that will work in my example are forms that are allowed at the top level of a class declaration—for example, field or method declarations and static initialization blocks. So arbitrary statements or expressions won't work, and would have to be further wrapped.

I was also thinking about ways to allow running the parser in other modes. Particularly, running it in the "declaration", "statement" and "expression" contexts. It might be quite useful, specifically for the expressions.

Parsers may not support it directly, instead we can generate a boilerplate code around the actual snippet and then strip related AST constructs in the response.

The problem is that the definition of "declaration" and "statement" is different in every language. So I'm not sure if we can draw a clear boundary there.

I'm reluctant to redesign the API to require that a driver support this use case, unless we have evidence that the need to parse arbitrary grammar fragments is common enough and/or important enough to carry its weight. If it is, then this might be one strategy a driver could use to get there. Whether or not this workaround trick is effective depends a lot on the grammar of the language.

It's possible in principle to start parsing at any nonterminal of a context-free grammar. In practice, however, parser implementations almost never support this, and there's no simple transformation on the implementation that reliably reflects the transformation on the grammar. That means a variant-entry API like this would greatly complicate the implementation and maintenance of a driver.

For that reason, although I agree we could do this (and there's a "neat idea" factor), I don't think we should plan to do it without some pretty strong reason.

bzz commented

If no objections, going to close this one as the question seems to be answered.

But please, do not hesitate to re-open, if that is not the case.