/javalang

Pure Python Java parser and tools

Primary LanguagePythonMIT LicenseMIT

javalang

image

javalang is a pure Python library for working with Java source code. javalang provides a lexer and parser targeting Java 8. The implementation is based on the Java language spec available at http://docs.oracle.com/javase/specs/jls/se8/html/.

The following gives a very brief introduction to using javalang.

Getting Started

This will return a CompilationUnit instance. This object is the root of a tree which may be traversed to extract different information about the compilation unit,

The string passed to javalang.parse.parse() must represent a complete unit which simply means it should represent a complete, valid Java source file. Other methods in the javalang.parse module allow for some smaller code snippets to be parsed without providing an entire compilation unit.

Working with the syntax tree

CompilationUnit is a subclass of javalang.ast.Node, as are its descendants in the tree. The javalang.tree module defines the different types of Node subclasses, each of which represent the different syntaxual elements you will find in Java code. For more detail on what node types are available, see the javalang/tree.py source file until the documentation is complete.

Node instances support iteration,

This iteration can also be filtered by type,

Component Usage

Internally, the javalang.parse.parse method is a simple method which creates a token stream for the input, initializes a new javalang.parser.Parser instance with the given token stream, and then invokes the parser's parse() method, returning the resulting CompilationUnit. These components may be also be used individually.

Tokenizer

The tokenizer/lexer may be invoked directly be calling javalang.tokenizer.tokenize,

This returns a generator which provides a stream of JavaToken objects. Each token carries position (line, column) and value information,

The tokens are not directly instances of JavaToken, but are instead instances of subclasses which identify their general type,

NOTE: The shift operators >> and >>> are represented by multiple > tokens. This is because multiple > may appear in a row when closing nested generic parameter/arguments lists. This abiguity is instead resolved by the parser.

Parser

To parse snippets of code, a parser may be used directly,

The parse methods are designed for incremental parsing so they will not restart at the beginning of the token stream. Attempting to call a parse method more than once will result in a JavaSyntaxError exception.

Invoking the incorrect parse method will also result in a JavaSyntaxError exception,

The javalang.parse module also provides convenience methods for parsing more common types of code snippets.