This is the code repository for Learning Apache Spark 2, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.
The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.
All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
Code bundle contains a DataSet folder for sample data used.
The code will look like the following:
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
You will need Spark 2.0, which you can download from Apache Spark website. We have used few different configurations, but you can essentially run most of these examples inside a virtual machine with 4-8GB of RAM, and 10 GB of available disk space.
Click here if you have any feedback or suggestions.