- java 8
build the project:
$ ./mvnw clean install
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] publication-explorer [pom]
[INFO] publication-explorer-nlp [jar]
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] publication-explorer 1.0.0-SNAPSHOT ................ SUCCESS [ 0.398 s]
[INFO] publication-explorer-nlp 1.0.0.-SNAPSHOT ........... SUCCESS [02:02 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:03 min
[INFO] Finished at: 2019-05-22T10:36:45+02:00
[INFO] ------------------------------------------------------------------------
After the project has been built, you can run it. The jar itself is a fat jar, and contains all libraries and other resources that are needed to run it (except the jvm, of course)
$ java -jar ./publication-explorer-nlp/target/publication-explorer-nlp-1.0.0.-SNAPSHOT-shaded.jar
To learn what happens, start digging through the code starting with the main: helt.pubex.Main
.
import the maven projects into the IDE of your choice, and start working.
create a directory tree with this layout:
* data
* input
* pdf <--- Put pdf files here
* txt <--- put *.txt files here
* output