A tool/library for mining of path-based representations of code. Work in progress.
- Mining of ASTs
- AstMiner is available via Maven Central
- Support of Java and Python
- Mining of path-based representations of code
This is an offspring of an internal utility from our ongoing research project.
Currently it supports extraction of path-based representations from code in Java and Python, but it is designed to be very easily extensible.
The default output format is inspired by code2vec.
Library is available via Maven Central repository. You can add the dependency in your build.gradle
file:
dependencies {
compile "io.github.vovak:astminer:0.1"
}
A few simple usage examples can be run with ./gradlew run
.
A somewhat more verbose example of usage in Java is available as well.
A new programming language can be supported in a few simple steps:
- Add the corresponding ANTLR4 grammar file to the
antlr
directory; - Run the
antlr4
Gradle task to generate the parser; - Implement a very minimal wrapper around the generated parser. See JavaParser or PythonParser for reference.
We believe that, thanks to extensibility, AstMiner could be valuable for many other researchers. However, our vision of potential applications is tunneled by our own work.
Please help make AstMiner easier to use by sharing your potential use cases. We would also appreciate pull requests with code improvements, more usage examples, documentation, etc.