Data Algorithms: Recipes for Scaling up with Hadoop and Spark
Early Release Version | Production Version (June 2015) |
---|---|
Author book signings for ("Data Algorithms") will be held in the O'Reilly booth on Thursday, Feb. 19, 2015. Complimentary copies of books will be provided for the first 25 attendees.
I have started adding bonus chapters.
This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of distributed MapReduce algrithms, which are implemented using
- Java (JDK7)
- Spark 1.3.0
- MapReduce/Hadoop 2.6.0
Please note that this is a work in progress...
- Title: Data Algorithms
- Author: Mahmoud Parsian
- Publisher: O'Reilly Media
- All source code, libraries, and build scripts are posted here
- Shell scripts are posted for running Spark and Mapreduce/Hadoop programs (in progress...)
Software | Version |
---|---|
Java | JDK7 |
Hadoop | 2.6.0 |
Spark | 1.3.0 |
Ant | 1.9.4 |
Name | Description |
---|---|
README.md | The file you are reading now |
README_lib.md | Must read before you build with Ant |
src | Source files for MapReduce/Hadoop/Spark |
scripts | Shell scripts to run MapReduce/Hadoop and Spark pograms |
lib | Required jar files for compiling source code |
build.xml | The ant build script |
dist | The ant build's output directory (creates a single JAR file) |
LICENSE | License for using this repository (Apache License, Version 2.0) |
misc | misc. files for this repository |
setenv | example of how to set your environment variables before building |
data | sample data files (such as FASTQ and FASTA) for basic testing purposes |
Also, each chapter has two sub folders:
org.dataalgorithms.chapNN.spark (for Spark programs)
org.dataalgorithms.chapNN.mapreduce (for Mapreduce/Hadoop programs)
where NN = 00, 01, ..., 31
- How To Run MapReduce/Hadoop Programs
- How To Run Java/Spark Programs in YARN
- How To Run Java/Spark Programs in Spark Cluster
To run python programs just call them with spark-submit
together with the arguments to the program.
- View Mahmoud Parsian's profile on LinkedIn
- Please send me an email: mahmoud.parsian@yahoo.com
- Twitter: @mahmoudparsian
Thank you!
best regards,
Mahmoud Parsian