Instructions

Setting up neo4j for Apache Spark

This section describes the steps used to setup the neo4j ecosystem

Locally on Linux (tbd)

Install/configure Apache Spark (for scala)
- https://intellipaat.com/blog/tutorial/spark-tutorial/downloading-spark-and-getting-started/
- Apache Spark 2.4.5 with Hadoop Pre-built 2.7 (comes with Scala 2.11.12)
  - https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
- Scala 2.11.12 (GCP image 1.4-ubuntu18 uses Spark 2.4.5 w/ Scala 2.11.12)
  - https://www.scala-lang.org/download/2.11.12.html
- Apache Maven (for Compiling Scala): https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/spark_building.html#building
  - create directory structure for Maven projects
  - update pom.xml to include dependencies (this is an iterative process usually, with the the following command)
  - run mvn clean install
  - (update) also run mvn assembly:assembly -DdescriptorId=jar-with-dependencies to include dependencies (specifically, BigQuery)
  - to see dependencies included run mvn dependency:tree
  - use this to copy to GCP bucket: gsutil cp {file_to_copy} gs://{bucket_name}/{location_to_save_to}
- Examples: https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples
  - Run standalone cluster: https://supergloo.com/spark-scala/apache-spark-cluster-run-standalone/
  - More on standalone: https://spark.apache.org/docs/latest/spark-standalone.html
  - Has some info on setting Spark Conf: https://mbonaci.github.io/mbo-spark/
  - spark-submit --class WordCount --master spark://zeus:7077 target/sparkwordcount-0.0.1.jar
Install/configure docker
- (used Option 1) https://phoenixnap.com/kb/how-to-install-docker-on-ubuntu-18-04
- Add yourself to the docker group (?)
  - https://www.digitalocean.com/community/questions/how-to-fix-docker-got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket
  - You have to logout and back in
Use neo4j docker run command (I added this to a script)
- https://neo4j.com/developer/docker-run-neo4j/
- {PROJECT_DIR}/scripts/create_neo4j_docker.sh
- Use docker ps -a to status of container (running or failed)
- Created start/stop scripts in [course_dir]/scripts
Access the DB
- localhost:7474
Connect Spark to neo4j
- https://neo4j.com/developer/apache-spark/