/dsx-spark

Primary LanguageJupyter Notebook

dsx-spark

VSP (Sacramento), 6/29/2017
Assets for IBM's Apache Spark Proof of Technology

Introduction to Apache Spark

Lab environment setup

You will be using IBM DSX notebooks and Apache Spark Service on IBM Bluemix Cloud to work on the labs.

PLEASE READ: https://github.com/tliakos/dsx-spark/blob/master/DSX%20workbook_draft1.docx

  1. Setup your Spark Service in IBM Bluemix:
    To setup your IBM Bluemix enviroment navigate to https://new-console.ng.bluemix.net, register and create a Spark service.

  2. Log in to IBM Data Science Experience (DSX) to create and run notebooks:
    To setup your IBM DSX (Data Science Experience) enviroment navigate to http://datascience.ibm.com and login using your bluemix userid.

A video tutorial on setting up the enviroment can be viewed here:
https://www.youtube.com/watch?v=yG3tVVDz1uE

Lab topics

To use these notebooks simply cut and paste the URLs below when you are creating a new notebook.

  1. Introduction to Spark - Python:
    https://github.com/tliakos/dsx-spark/blob/master/Lab%201-%20Introduction%20to%20Spark-Student.ipynb

  2. Introduction to Spark SQL:
    https://github.com/tliakos/dsx-spark/blob/master/Lab%202:%20Spark%20SQL%20-%20Student.ipynb

  3. Spark Machine Learning - Python:
    https://github.com/tliakos/dsx-spark/blob/master/Lab%203%20-%20Machine%20Learning%20Student.ipynb

Data set url

https://raw.githubusercontent.com/tliakos/dsx-spark/master/data.csv

Additional links

Spark Streming webinar link: https://www.youtube.com/watch?v=_mFm2F7UQgU
Spark Streming demo code : https://github.com/smatlapudi/spark-streaming-webinar1