/e6895-project

EECS-E6895 Project repository.

Primary LanguageHTMLMIT LicenseMIT

Instructions

Setting up neo4j for Apache Spark

This section describes the steps used to setup the neo4j ecosystem

Locally on Linux (tbd)

  1. Install/configure Apache Spark (for scala)

  2. Install/configure docker

  3. Use neo4j docker run command (I added this to a script)

  4. Access the DB

    • localhost:7474
  5. Connect Spark to neo4j

In GCP

https://spark.apache.org/docs/latest/

Followed these instructions:

  1. Create new project and enable Compute Engine API *

  2. Set up local GCP project:

    • if configured from env vars (which mine is):
      export CLOUDSDK_CORE_PROJECT=eecs-e6895-edu
      Note: edit ~/.bashrc!
    • if configured via gcloud:
      gcloud config set project eecs-e6895-edu
  3. https://neo4j.com/developer/neo4j-cloud-google-image/

  4. https://medium.com/neo4j/running-neo4j-on-google-cloud-6592c1b4e4e5

I created scripts in [course_dir]/scripts

  1. Running cluster locally https://spark.apache.org/docs/latest/spark-standalone.html

Next steps:

  1. Using GraphQL in Cloud Run (docker container):

  2. Build a custom docker container with our DB

  3. Self healing graph DB using clusters?

In AWS (tbd)

  1. https://neo4j.com/developer/neo4j-cloud-aws-ec2-ami/

Other stuff

Page Rank

Look into caching https://spark.apache.org/docs/latest/quick-start.html#caching

Psychology: https://ipip.ori.org/newPublications.htm