/cascalog-cascading-test

1. Jcascalog/Cascalog and cascading performance test. 2. Creating maven project for Jcascalog

Primary LanguageJava

cascalog-cascading-test

  • This is a simple test to check cascading flow preparation/compilation time. Observarion here: Cascading job preparation time is increasing in non linear fashion as number of jobs increases.
  • This is also an example of running jcascalog as a simple java program using maven instead of Leiningen.

Execution Steps:

  • Checkout the project.

  • build the project using maven. mvn clean install

  • put the src/main/resources/input.txt file in local /var/tmp directory.

  • Run the following command to execute the test:

    java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog [depth] /var/tmp/input.txt

  • [depth] = depth is integer. This signifies the number of selfjoin. For each depth cascading creates 2 jobs. Thus by increasing depth we can test the preparation time for multiple cascading jobs

Performance Stats:

Depth 5, Jobs 10, Preparation time: less that 1 sec

java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog 5 /var/tmp/input.txt

Log:

[17/06/2013:14:13:19 IST] [INFO] [cascading.property.AppProps main]: using app.id: FC106638099703F5450E89B08BB7442F
[17/06/2013:14:13:20 IST] [INFO] [cascading.util.Version flow]: Concurrent, Inc - Cascading 2.0.0
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: [] starting
......
[17/06/2013:14:13:20 IST] [INFO] [cascading.flow.Flow flow]: []  starting jobs: 10

Depth 10, Jobs 20 , Preparation time: 15 mins.

 java -cp cascalog-cascading-test-1.0.1-CR-SNAPSHOT.jar:/usr/local/hadoop/hadoop-core-0.20.2-cdh3u4.jar:/usr/local/hadoop/lib/* com.home.test.CascadingTestInJcascalog 10 /var/tmp/input.txt

Log:

[17/06/2013:14:14:50 IST] [INFO] [cascading.property.AppProps main]: using app.id: 264A79523E9A9AF21EB04D2814FBCF9F
[17/06/2013:14:29:54 IST] [INFO] [cascading.util.Version flow]: Concurrent, Inc - Cascading 2.0.0
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: [] starting
.....
[17/06/2013:14:29:54 IST] [INFO] [cascading.flow.Flow flow]: []  starting jobs: 20

As per the stats, job preparation time for cascading is increasing in a non linear fashion.

githalytics.com alpha