Step-by-step Deep Leaning Tutorials on Apache Spark using BigDL. The tutorials are inspired by Apache Spark examples, the Theano Tutorials and the Tensorflow tutorials.
- RDD
- DataFrame
- SparkSQL
- StructureStreaming
- Forward and backward
- Linear Regression
- Introduction to MNIST
- Logistic Regression
- Feedforward Neural Network
- Convolutional Neural Network
- Recurrent Neural Network
- LSTM
- Bi-directional RNN
- Auto-encoder
- Mac OS / Linux
- Python 2.7 (Required python libraries: numpy, scipy, pandas, scikit-learn, matplotlib. You may want to use pip to install these packages.)
- Apache Spark 2.1.0
- Jupyter Notebook 4.1
- BigDL 0.1.1(download linux64, mac)
- Create start_notebook.sh, copy and paste the contents below, and edit SPARK_HOME, BigDL_HOME accordingly.
#!/bin/bash
#setup pathes
SPARK_HOME= where the downloaded spark
BigDL_HOME= where the downloaded file unzipped
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=./ --ip=* --no-browser"
source ${BigDL_HOME}/bin/bigdl.sh
${SPARK_HOME}/bin/pyspark \
--master local[4] \
--driver-memory 4g \
--properties-file ${BigDL_HOME}/conf/spark-bigdl.conf \
--py-files ${BigDL_HOME}/lib/bigdl-0.1.1-python-api.zip \
--jars ${BigDL_HOME}/lib/bigdl-SPARK_2.1-0.1.1-jar-with-dependencies.jar \
--conf spark.driver.extraClassPath=${BigDL_HOME}/lib/bigdl-SPARK_2.1-0.1.1-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=${BigDL_HOME}/lib/bigdl-SPARK_2.1-0.1.1-jar-with-dependencies.jar
- Execute start_notebook.sh in bash, it will start a jupyter notebook service and output the url to access
- Open a browser - Suggest Chrome or Firefox or Safari
- Access notebook client at address http://localhost:8888/?token=xxxx (which is in the output of the start_notebook.sh), open the example ipynb files and execute.