Installation

Bootstrap a default version of Amundsen using Docker

The following instructions are for setting up a version of Amundsen using Docker, backed by Neo4j and Elasticsearch.

  1. Create a private fork of this repo.
  2. Clone your fork this repo and its submodules by running:
    git clone --recursive git@github.com:stemma-ai/amundsen-custom.git
  3. Install docker and docker-compose. Allocate at least 3GB available to Docker.
  4. Enter the cloned directory and run:
    docker-compose up --abort-on-container-exit
  5. Ingest static sample data into Neo4j:
    • In a separate terminal window, cd to the databuilder/upstream submodule.
    • The sample_data_loader.py Python script included in examples/ directory uses elasticsearch client, pyhocon and other libraries. Install the dependencies in a virtual env and run the script by following the commands below:
     python3 -m venv venv
     source venv/bin/activate
     pip3 install --upgrade pip
     pip3 install -r requirements.txt
     python3 setup.py install
     python3 example/scripts/sample_data_loader.py
  6. View UI at http://localhost:5000 and try to search test, it should return some results.

Verify setup

  1. You can verify dummy data has been ingested into Neo4j by by visiting http://localhost:7474/browser/ and run MATCH (n:Table) RETURN n LIMIT 25 in the query box. You should see two tables:
    1. hive.test_schema.test_table1
    2. hive.test_schema.test_table2
  2. You can verify the data has been loaded into the metadataservice by visiting:
    1. http://localhost:5000/table_detail/gold/hive/test_schema/test_table1
    2. http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2

Troubleshooting

  1. If the Docker Container doesn't have enough heap memory for Elastic Search, es_amundsen will with the error es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

    1. Increase the Heap memory in the host machine. In Linux, that means modifying your own machine. For Mac, that means modifying the Docker for Mac configuration. See these detailed instructions from Elastic.
    2. Re-run docker-compose
  2. If docker-compose stops with a org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment message, then es_amundsen cannot write to .local/elasticsearch. There is a file share mount established between the Docker container and your host machine, so run this in your terminal:

    1. chown -R 1000:1000 .local/elasticsearch
    2. Re-reun docker-compose
  3. If ES container crashed with Docker error 137 on the first call from the website (http://localhost:5000/), this is because you are using the default Docker engine memory allocation of 2GB. The minimum needed for all the containers to run with the loaded sample data is 3GB. To do this go to your Docker -> Preferences -> Resources -> Advanced and increase the Memory, then restart the Docker engine.

  4. Check if all 5 Amundsen related containers are running with docker ps? Can you connect to the Neo4j UI at http://localhost:7474/browser/ and similarly the raw ES API at http://localhost:9200? Does Docker logs reveal any notable issues?

  5. Report the issue on this repo. The standard instructions should Just Work for everyone, and we'll gladly help get your install working!