A Text Analysis Using Project Gutenberg
=======================================
Installation (OSX)
- Install the homebrew package manager
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-
Install Java from: https://java.com/en/download/
-
Brew install python3, spark, and scala
brew install python3 apache-spark scala
- Set Environment variables for spark/java in your bash_profile. Java can be installed in many places... Examples:
if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi
# setup spark for jupyter for prototyping
PYSPARK_DRIVER_PYTHON=jupyter
PYSPARK_DRIVER_PYTHON_OPTS='notebook'
-
Setup a Virtual environment
-
pip install requirements.txt
-
Protype anything locally and when ready, run on spark cluster!!