- This is a simple test to check cascading flow preparation/compilation time. Observarion here: Cascading job preparation time is increasing in non linear fashion as number of jobs increases.
- This is also an example of running jcascalog as a simple java program using maven instead of Leiningen.
-
Checkout the project.
-
build the project using maven.
mvn clean install
-
put the
src/main/resources/input.txt
file in local/var/tmp
directory. -
Run the following command to execute the test:
java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog [depth] /var/tmp/input.txt
-
[depth] = depth is integer. This signifies the number of selfjoin. For each depth cascading creates 2 jobs. Thus by increasing depth we can test the preparation time for multiple cascading jobs
java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog 5 /var/tmp/input.txt
Log:
[17/06/2013:14:13:19 IST] [INFO] [cascading.property.AppProps main]: using app.id: FC106638099703F5450E89B08BB7442F
[17/06/2013:14:13:20 IST] [INFO] [cascading.util.Version flow]: Concurrent, Inc - Cascading 2.0.0
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] starting
......
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] starting jobs: 10
java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog 10 /var/tmp/input.txt
Log:
[17/06/2013:14:14:50 IST] [INFO] [cascading.property.AppProps main]: using app.id: 264A79523E9A9AF21EB04D2814FBCF9F
[17/06/2013:14:29:54 IST] [INFO] [cascading.util.Version flow]: Concurrent, Inc - Cascading 2.0.0
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: [] starting
.....
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: [] starting jobs: 20
As per the stats, job preparation time for cascading is increasing in a non linear fashion.