This is a ripped off version of [Snowplow's sample scalding project] scalding-example-project, many thanks to them for it!
Simply run:
sbt test
In order to autotest (run continuously, watching for source changes) you need to run it in some terminal session:
sbt '~test'
First, you need to package fat jar:
sbt assembly
Run somewhere with hadoop jar
available:
hadoop jar target/scala-2.10/scalding-example-project-assembly-0.0.1.jar net.aystech.scalding.WordCountJob --hdfs --input some/input/file --output some/output/file
It'll be great to extend this with some other examples, feel free to contribute your pull request. Please include simple and concise tests, also some test data would be nice.
- Avro file read & write
- Parquet file read & write
- Using GlobHfs
- Run some arbitrary code traversing HDFS for multiple source paths
- Use Pattern or any other way to apply PMML model within workflow