This project is based on Eclipse JDT and ANTLR 4 to annotate java source files. It will store abstract syntax tree structures and token information of files in a Java project. Most of the time you only need to query information stored in entity_all
view.
- To build from source, you need to install Maven 3. Type
mvn clean package
in source root path then you could findjdt.annotator-0.0.1-SNAPSHOT.jar
intarget/
. - You may also download jdt.annotator.jar directly (may not be the latest version).
usage: jdt.annotator --src <path> [options]
= a source code annotator =
-d,--dbname <arg> database name to connect to (default: "entity")
-H,--host <arg> database server host ip (default: "localhost")
-l,--lib <arg> absolute root path of libraries (.jar)
-P,--port <arg> database server port (default: "5432")
-p,--project <arg> project name (default: folder name containing
source code)
-r,--reset reset all annotated astnode information in database
-s,--src <arg> absolute root path of source code files
-U,--username <arg> (optional) username, must specify password as well
-W,--password <arg> (optional) password, must specify username as well
- You need a running PostgreSQL database. By default it will connect to
jdbc:postgresql://localhost:5432/entity
- You could annotate multiple projects storing to the same database. Each entity is uniquely identify by (
entity_id
,project_id
) - If you omit
-p
option, program will use folder name containing source code as default project name. - If the project you want to annotate use ant or maven to compile, it may specify dependencies in
build.xml
orpom.xml
. If you want to annotate type, method or other information from those library, you need to download those dependencies first and specify the library folder in-l
option.- Maven: type
mvn org.apache.maven.plugins:maven-dependency-plugin:2.1:copy-dependencies -DoutputDirectory=/your/library/folder
in project root folder. Then maven will download dependencies.jar
s to that folder. - Ant: You need to write a configuration file to download dependencies, you may refer to this, this or this.
- Maven: type
- View
entity_all
combines tableproject
,file
,entity
,nodetype
andcross_ref
. Meaning of columns in each table with the same name are the same. - Table
method
store information about all methods in this source code. - Table
cross_ref_key
store type descriptor for eachName
ASTNode
.
Table | Column | Description |
---|---|---|
entity_all | entity_id | Primary key is (entity_id , project_id ). entity_id is unique for a given project_id . It DOESN'T mean the order of appearance in source code |
start_pos | starting position of this node in file | |
length | length of this node | |
end_pos | ending position of this node in file (exclusive). If you store source code in a string, then code.substring(start_pos, end_pos) will give you source code snippet of this entity. |
|
start_line_number | this entity starts at (line number, column number) in file, both start from 1 to conform with vim |
|
start_column_number | ||
end_line_number | this enetity ends at (line number, column number) in file. | |
end_column_number | ||
nodetype_id | For ASTNode, nodetype_id < 100 . For Token, nodetype_id >= 100 . Identifier, null/boolean/string/character literals are ASTNodes instead of tokens. |
|
nodetype | name of the type of this node | |
file_id | ||
file_name | absolute file name | |
project_id | ||
project_name | given project name | |
project_path | given source code folder, (project_name, project_path) uniquely specify a "project" |
|
string | formatted code snippet of this node. This is generated from ASTNode , the string may not be the same with your source code. |
|
raw | code snippet of this node | |
parent_id | parent entity's entity_id (with the same project_id ), parent_id of root node is -1 . |
|
declared_id | if this node is declared in another place in this project, this id could find out that node. if it's declared in another library, it won't show up. | |
cross_ref_key | cross_ref_key | Only Name type ASTNode has value. |
method | entity_id | |
project_id | ||
method_name | ||
return_type | ||
argument_type | ||
full_signature | full type descriptor for this method | |
is_declare | boolean value to indicate if this is a method declaration or not |
I use a string to encode a type. Here is the table:
type descriptor | type | description |
---|---|---|
B | byte | |
C | char | |
D | double | |
F | float | |
I | int | |
J | long | |
L ClassName ; | Class | java.lang.String -> Ljava/lang/String; Map <String, Integer> -> Ljava/util/Map<Ljava/lang/String;Ljava/lang/Integer;>; |
S | short | |
Z | boolean | |
[ | reference | int [][] -> [[I, open bracket only. ... equals to [ |
V | void |
The rule for a method descriptor is Class
. method name
( method arguments (no comma)
) return type
exceptions (each starting with |)
.
Therefore the descriptor for main
method defined in package demo.example
throwing IOException
and SQLException
is Ldemo/example/Main;.main([Ljava/lang/String;)V|Ljava/io/IOException;|Ljava/sql/SQLException;
.
Check JVM type descriptor for official documentation.
- Use
JDT
to generate ASTs for all files and cross reference for allName
nodes. - For each
.java
- Use batch operation to insert all AST nodes information at once. If current file is not compilable, it will skip to the next
.java
file without storing anything to database. - Use
ANTLR
to collect all tokens, for each token, find the tightest AST node containing that token and save that AST node as the parent of current token. - Take all cross reference keys generated in first step, figure out where is each
Name
node declared.
- Use batch operation to insert all AST nodes information at once. If current file is not compilable, it will skip to the next
For documentation of each ASTNode, refer jdt.core.dom.
outOfMemoryException
: Allocating more memory for JVM byjava -Xmx2048m -jar …
- If you encounter build failure while download dependencies by maven, probabily your maven doesn't support that version of dependency plugin. Try to upgrade maven or use older version of dependency plugin.
- create database in postgresql:
$> createdb database_name
in shell orCREATE DATABASE database_name
in SQL.