Code repository for O'reilly course : 'Integrating Hadoop and Spark'

Getting Started

You can clone this repository as follows

    $   git   clone   git@github.com:elephantscale/hadoop-spark.git

Lab Order

  1. Dev environment setup
  2. Hadoop setup
  3. Spark Shell
  4. RDDs
  5. Dataframes
  6. Hive and Spark
  7. Spark and YARN
  8. Spark Applications

Resources

Books

Sites

Vendors