The following instructions are for setting up a version of Amundsen using Docker, backed by Neo4j and Elasticsearch.
- Create a private fork of this repo.
- Clone your fork this repo and its submodules by running:
git clone --recursive git@github.com:stemma-ai/amundsen-custom.git
- Install
docker
anddocker-compose
. Allocate at least 3GB available to Docker. - Enter the cloned directory and run:
docker-compose up --abort-on-container-exit
- Ingest static sample data into Neo4j:
- In a separate terminal window,
cd
to the databuilder/upstream submodule. - The
sample_data_loader.py
Python script included inexamples/
directory uses elasticsearch client, pyhocon and other libraries. Install the dependencies in a virtual env and run the script by following the commands below:
python3 -m venv venv source venv/bin/activate pip3 install --upgrade pip pip3 install -r requirements.txt python3 setup.py install python3 example/scripts/sample_data_loader.py
- In a separate terminal window,
- View UI at
http://localhost:5000
and try to searchtest
, it should return some results.
- You can verify dummy data has been ingested into Neo4j by by visiting
http://localhost:7474/browser/
and runMATCH (n:Table) RETURN n LIMIT 25
in the query box. You should see two tables: - You can verify the data has been loaded into the metadataservice by visiting:
-
If the Docker Container doesn't have enough heap memory for Elastic Search,
es_amundsen
will with the errores_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
- Increase the Heap memory in the host machine. In Linux, that means modifying your own machine. For Mac, that means modifying the Docker for Mac configuration. See these detailed instructions from Elastic.
- Re-run
docker-compose
-
If
docker-compose
stops with aorg.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment
message, thenes_amundsen
cannot write to.local/elasticsearch
. There is a file share mount established between the Docker container and your host machine, so run this in your terminal:chown -R 1000:1000 .local/elasticsearch
- Re-reun
docker-compose
-
If ES container crashed with Docker error 137 on the first call from the website (http://localhost:5000/), this is because you are using the default Docker engine memory allocation of 2GB. The minimum needed for all the containers to run with the loaded sample data is 3GB. To do this go to your
Docker -> Preferences -> Resources -> Advanced
and increase theMemory
, then restart the Docker engine. -
Check if all 5 Amundsen related containers are running with
docker ps
? Can you connect to the Neo4j UI at http://localhost:7474/browser/ and similarly the raw ES API at http://localhost:9200? Does Docker logs reveal any notable issues? -
Report the issue on this repo. The standard instructions should Just Work for everyone, and we'll gladly help get your install working!