The following programs were written to explore Machine Learning using DL4J and Java API of Spark.
The model training and inference scripts are benchmarks, with results being used for the purposes of a Research Project
See each folder and script for details.
mvn clean compile install
java -cp target/<jar-file> org.dl4j.benchmarks.BenchMarkInferenceLocalModeHDFS2048
spark-submit --class org.dl4j.benchmarks.BenchMarkInferenceDistributedHDFS8192 --master spark://afog-master:7077 --conf spark.executor.memory=2g --total-executor-cores=12 --executor-cores=4 target/deeplearning4j-example-sample-1.0.0-beta7-bin.jar
- As of the DL4J version in the pom.xml, CSV format datasets need headers to be removed if using CSVRecordReader. Skipping lines does not work and is a bug.
- Configure ram and cores according to requirements
- -bin.jar contains all the dependencies. Non bin/Non Uber jar files lack them and can be used to run programs in Spark Local Mode.
- Read Lazy Evaluation Article 1 and Lazy Evaluation Article 2 for understanding inserting actions in time measurement.
- Distributed Training Code files do not work on ARM in this release. Follow issue here for updates