此repository是Apache Mesos(10)-使用Chronos创建复杂任务的示例代码。对Mesos in Action的wordcount-example的代码进行了小的变动。
This is an example Spark job that reads a copy of Leo Tolstoy's
War and Peace from HDFS, counts the number of times each
word appears, then stores the word counts in a text file (also on HDFS). This
is meant to be used with the Chronos jobs located at ../complex-etl-job
.
Clone the repo and package up the example:
$ git clone https://github.com/andyyoung01/spark-wordcount.git
$ cd spark-wordcount/wordcount-example
$ sbt package
Assuming the spark-submit
utility is available on the $PATH
of your gateway
machine, submit the job by running the following command:
$ spark-submit target/scala-2.10/war-and-peace-wordcount_2.10-0.1.0-SNAPSHOT.jar \
/tmp/warandpeace
The results of the job can then be found on HDFS at
${basepath}/warandpeace-counts.txt
. You can get the top 10 words in the book
by running the following command:
$ hadoop fs -cat /tmp/warandpeace/warandpeace-counts.txt/part-* | sort -t, -rnk2 | head -10
(the,34570)
(and,22159)
(to,16716)
(of,14991)
(,13568)
(a,10521)
(he,9809)
(in,8801)
(his,7967)
(that,7813)