/spark-docker-intellij

Docker container and setup instructions to run spark jobs in docker with intellij IDEs

Primary LanguageDockerfile

Spark with docker and intellij IDEs

Docker container and setup instructions to run spark jobs in docker with intellij IDEs

Setting up docker

git clone git@github.com:pranavgade20/spark-docker-intellij.git
# replace YOUR_KEY_HERE in the Dockerfile with your ssh publickey
docker build --tag spark-ssh .
docker-compose up -d

This will build the docker image and start spark. At this point, you can ssh into the container(ssh root@IP) and run stuff with:

python3  /spark/bin/spark-submit \
            --master ${SPARK_MASTER_URL} \
            ${SPARK_SUBMIT_ARGS} \
            ${SPARK_APPLICATION_PYTHON_LOCATION} ${SPARK_APPLICATION_ARGS}

To set up intellij, install the Big Data Tools Extension(published by jetbrains), and create a new configuration as follows(where the ssh IP can be obtained by doing docker network inspect spark-default and looking for the IP corresponding to the container spark-default): configuration screenshot

To use numpy/pandas on your server, you need to install the following toolchain:

docker exec -it spark-ssh apk add make automake gcc g++ subversion python3-dev