javalang is a pure Python library for working with Java source code. javalang provides a lexer and parser targeting Java 8. The implementation is based on the Java language spec available at http://docs.oracle.com/javase/specs/jls/se8/html/.
The following gives a very brief introduction to using javalang.
This will return a CompilationUnit
instance. This object is the root of a tree which may be traversed to extract different information about the compilation unit,
The string passed to javalang.parse.parse()
must represent a complete unit which simply means it should represent a complete, valid Java source file. Other methods in the javalang.parse
module allow for some smaller code snippets to be parsed without providing an entire compilation unit.
CompilationUnit
is a subclass of javalang.ast.Node
, as are its descendants in the tree. The javalang.tree
module defines the different types of Node
subclasses, each of which represent the different syntaxual elements you will find in Java code. For more detail on what node types are available, see the javalang/tree.py
source file until the documentation is complete.
Node
instances support iteration,
This iteration can also be filtered by type,
Internally, the javalang.parse.parse
method is a simple method which creates a token stream for the input, initializes a new javalang.parser.Parser
instance with the given token stream, and then invokes the parser's parse()
method, returning the resulting CompilationUnit
. These components may be also be used individually.
The tokenizer/lexer may be invoked directly be calling javalang.tokenizer.tokenize
,
This returns a generator which provides a stream of JavaToken
objects. Each token carries position (line, column) and value information,
The tokens are not directly instances of JavaToken
, but are instead instances of subclasses which identify their general type,
NOTE: The shift operators >>
and >>>
are represented by multiple >
tokens. This is because multiple >
may appear in a row when closing nested generic parameter/arguments lists. This abiguity is instead resolved by the parser.
To parse snippets of code, a parser may be used directly,
The parse methods are designed for incremental parsing so they will not restart at the beginning of the token stream. Attempting to call a parse method more than once will result in a JavaSyntaxError
exception.
Invoking the incorrect parse method will also result in a JavaSyntaxError
exception,
>>> tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
>>> parser = javalang.parser.Parser(tokens)
>>> parser.parse_type_declaration()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "javalang/parser.py", line 336, in parse_type_declaration
return self.parse_class_or_interface_declaration()
File "javalang/parser.py", line 353, in parse_class_or_interface_declaration
self.illegal("Expected type declaration")
File "javalang/parser.py", line 122, in illegal
raise JavaSyntaxError(description, at)
javalang.parser.JavaSyntaxError
The javalang.parse
module also provides convenience methods for parsing more common types of code snippets.