/SparkML

Spark ML with pyspark

Primary LanguageJupyter Notebook

SparkML

Take your first steps with Spark ML and pyspark. Gain understanding of Spark ML with unique hands-on experience with Spark ML First steps course!

Getting started:

Make sure you have docker installed on your device.

  1. Run docker

  2. Run the next command:

    docker run -it -p 8888:8888 jupyter/pyspark-notebook

    You will get similar response back:

    Executing the command: jupyter notebook
     [I 15:49:48.293 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
     [I 15:49:48.887 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
     [I 15:49:48.888 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
     [I 15:49:48.891 NotebookApp] Serving notebooks from local directory: /home/jovyan
     [I 15:49:48.891 NotebookApp] The Jupyter Notebook is running at:
     [I 15:49:48.891 NotebookApp] http://0a3437183fee:8888/?token=43143a485357351ef522a1840f8c8c141a1be2bcf5f9b4de
     [I 15:49:48.892 NotebookApp]  or http://127.0.0.1:8888/?token=43143a485357351ef522a1840f8c8c141a1be2bcf5f9b4de
     [I 15:49:48.892 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
     [C 15:49:48.896 NotebookApp]
     To access the notebook, open this file in a browser:
         file:///home/jovyan/.local/share/jupyter/runtime/nbserver-8-open.html
     Or copy and paste one of these URLs:
         http://0a3437183fee:8888/?token=43143a485357351ef522a1840f8c8c141a1be2bcf5f9b4de
      or http://127.0.0.1:8888/?token=43143a485357351ef522a1840f8c8c141a1be2bcf5f9b4de
  3. Copy the LAST url with the token= , it will looks something like this, but you will have your own token:

http://127.0.0.1:8888/?token=43143a485357351ef522a1840f8c8c141a1be2bcf5f9b4de

past it in your browswer. This will be your jupyter work environment for the course.

  1. Clone this repo or download the zipped files - notebook.zip and detecting-twitter-bot-data.zip

  2. Extracte the files(unzip) and upload the Exercise, Solution and detecting-twitter-bot-data files into Jupyter using the upload button. Use the upload button like in the photo:

  3. Follow instructions and write your findings in chat!

Notes:

Exercise folder containes the exercise chapters. Solution folder containes the solution for the exercise.

It's recommend to have both in your Jupyter environment before course starts.

License:

This exercise is part of the O'Reilly Online Course: Spark ML First Steps produced and delivered by Adi Polak.

If you would like to use it not as part of the online course, please contact Adi Polak on Twitter.