A Clojure app for parsing and tokenizing java source code. Doc2vec embeddings are build for parsed code, to allow comparison and plagiarism detection.
lein antlr
lein run -i path/to/dir_with_java_files -o output_file
It also supports glob expressions for path:
lein run -i "path/to/dir_with_java_files/{subdir1,subdir2,subdir3}" -o output_file
For Doc2vec part please refer to doc2vec-plagiarism.ipynb.
Copyright © 2016 Dmitry Kholodilov
Distributed under BSD license.