nomad-coe/nomad

Unclear installation procedure and out of date requirements

Closed this issue · 15 comments

I'm in the ongoing process of trying to get nomad running on my Ubuntu 22.04 Server and am encountering many problems.
For starters the README.md only references how to install the client library and I guess that when you are trying to install the server you are meant to follow the "for developers" link https://nomad-lab.eu/prod/rae/docs/index.html
The documentation there doesn't seem to contain any installation advice until you reach "Developing NOMAD" https://nomad-lab.eu/prod/rae/docs/developers.html
That page seems to be out of date since commit 752b5c6 (Tue Nov 29 16:38:31 2022) since setup.sh got deleted.
The requirements.txt also seems to be at least partially broken since scipy==1.7.1 doesn't exist in pip (1.7.2 does though). Building the docker images with docker build . also fails. So before I report bugs on every method I tried of which most aren't probably supported or meant to be used that way I wanted to ask where the documentation for running a nomad server of any kind is located (at this point I don't care if it is docker or through a python venv) before I spam the bug tracker with tickets.

The setup described in https://nomad-lab.eu/prod/v1/test/docs/oasis.html tries at the end do compose up gui which doesn't exist? My main problem is probably that I'm confused by the terminology and don't know where to start. Do I need labs? Oasis? Coe? The container-manager? Should I build it from the nomad-FAIR repo?

The main issue that is underpinning all this is the link in the README.md. It is outdated and does not reflect the current develop branch.

The right documentation for the current develop is deployed to our beta installation here:
https://nomad-lab.eu/prod/v1/staging/docs/index.html

From here, it depends on what you want to do:

Understandably, this is all confusing and not very accessible for outsiders. We are currently building a new project web-site, revised documentation, etc. This will eventually make the terminology and documentation more accessible.

With a few details on what you want to achieve, I can give more help.

Yeah I got oasis up and running now as well (had to pull down my existing containers and do a clean setup, but that worked), but it's only a docker setup. Hoped that I could get it clean up and running without using and most docs seem to be at least partially out of date (like mentioned above the oasis link states at one point to execute docker compose up app worker gui and the gui service is not defined (I guess gui is either app or north?)

Anyway thank you again for your reference links I will try to limit myself to the documentation found under https://nomad-lab.eu/

Edit:

With a few details on what you want to achieve, I can give more help.

I tried to reproduce the docker binaries and/or to do a clean setup of nomad without using docker (or with a minimal docker setup). I wanted to check all dependencies and run the service with the newest libraries possible to offer a (hopefully) secure and up to date instance. Sadly I couldn't get the official docker builds reproduced and couldn't run a dockerless build properly since it was kind of hard to figure out which service depends on which service, in which way and even what services are needed. (I realize my sentences structure is a bit missed up so I will try to rephrase as bullet points)

  • generate my own docker images
    • failed building
  • set up a docker-less (or nearly docker-less) instance
    • had problems installing all the requirements
    • couldn't figure out what I need to install and start in what order (got confused with repos and package/program names)
    • had problems with some dependencies installed from pip (libs depended on older numpy versions and so on)

And yeah like you mentioned it was pretty confusing since I referenced older docs.
Anyway thank you for taking the time to answer and I will probably post follow ups while trying to get it running.

I have a question that might be related to the last comment by blacklotus and this issue.
I want to build a docker image with my fork of the nomad repository, but until now i am not successful. My steps:

  • clean docker cache
  • command:
 DOCKER_BUILDKIT=1 docker build --pull --no-cache --label nomad-mdforti -t check-structure-name  https://github.com/mdforti/nomad.git#check_structure_name

and the output:

[+] Building 1.8s (43/59)
 => CACHED [internal] load git source https://github.com/mdforti/nomad.git#check_structure_name                                                                 0.0s
 => [internal] load metadata for docker.io/library/python:3.7-slim                                                                                              0.7s
 => [internal] load metadata for docker.io/library/node:16.15                                                                                                   0.7s
 => CANCELED [base_node 1/1] FROM docker.io/library/node:16.15@sha256:a13d2d2aec7f0dae18a52ca4d38b592e45a45cc4456ffab82e5ff10d8a53d042                          0.2s
 => => resolve docker.io/library/node:16.15@sha256:a13d2d2aec7f0dae18a52ca4d38b592e45a45cc4456ffab82e5ff10d8a53d042                                             0.0s
 => => sha256:a1f665affa21f2b46e476e0cb77d92b83e3713355bd28d026c257b16353c6d90 2.21kB / 2.21kB                                                                  0.0s
 => => sha256:8a014c92148934973210d840dc7cfed53e0afba38d839afaa48ed5150eae19af 0B / 7.86MB                                                                      0.9s
 => => sha256:293ff1be7001d642a624409e2d5f90e7708ef7e6f1a75f4eb7362a9296e18d20 0B / 10.00MB                                                                     0.9s
 => => sha256:a13d2d2aec7f0dae18a52ca4d38b592e45a45cc4456ffab82e5ff10d8a53d042 1.21kB / 1.21kB                                                                  0.0s
 => => sha256:b9f398d30e45c8b56dbe322047e6f21ae56963f4976cbd4081b2e4f1d6fb8344 7.74kB / 7.74kB                                                                  0.0s
 => => sha256:ea267e4631a981caf2841a7e9a1faf29cef9d020c4378fc64845802d17586531 0B / 50.44MB                                                                     0.9s
 => CACHED [base_python 1/1] FROM docker.io/library/python:3.7-slim@sha256:f9b06aae870f02611984695ab7d63a88b78ae86c4ebc1d2aa7d03237f2ac1a63                     0.0s
 => CANCELED [dev_python  1/29] RUN apt-get update  && apt-get install --yes --quiet --no-install-recommends       libgomp1       libmagic-dev       curl       0.9s
 => CANCELED [builder 1/7] RUN apt-get update  && apt-get install --yes --quiet --no-install-recommends       libgomp1       libmagic1       file       gcc     0.9s
 => CANCELED [final  1/10] RUN apt-get update  && apt-get install --yes --quiet --no-install-recommends        libgomp1        libmagic1        curl        zi  0.8s
 => CACHED [dev_python  2/29] WORKDIR /app                                                                                                                      0.0s
 => CACHED [dev_python  3/29] COPY requirements-dev.txt .                                                                                                       0.0s
 => CACHED [dev_python  4/29] RUN pip install build  && pip install --progress-bar off --prefer-binary -r requirements-dev.txt                                  0.0s
 => CACHED [dev_python  5/29] COPY dependencies ./dependencies                                                                                                  0.0s
 => CACHED [dev_python  6/29] COPY docs ./docs                                                                                                                  0.0s
 => CACHED [dev_python  7/29] COPY examples ./examples                                                                                                          0.0s
 => CACHED [dev_python  8/29] COPY nomad ./nomad                                                                                                                0.0s
 => CACHED [dev_python  9/29] COPY scripts ./scripts                                                                                                            0.0s
 => CACHED [dev_python 10/29] COPY tests ./tests                                                                                                                0.0s
 => CACHED [dev_python 11/29] COPY .pylintrc      AUTHORS      LICENSE      MANIFEST.in      mkdocs.yml      pycodestyle.ini      pyproject.toml      pytest.i  0.0s
 => CACHED [dev_python 12/29] COPY ops/docker-compose ./ops/docker-compose                                                                                      0.0s
 => CACHED [dev_python 13/29] COPY gui/src/metainfo.json ./gui/src/metainfo.json                                                                                0.0s
 => CACHED [dev_python 14/29] COPY gui/src/searchQuantities.json ./gui/src/searchQuantities.json                                                                0.0s
 => CACHED [dev_python 15/29] COPY gui/src/toolkitMetadata.json ./gui/src/toolkitMetadata.json                                                                  0.0s
 => CACHED [dev_python 16/29] COPY gui/src/unitsData.js ./gui/src/unitsData.js                                                                                  0.0s
 => CACHED [dev_python 17/29] COPY gui/src/parserMetadata.json ./gui/src/parserMetadata.json                                                                    0.0s
 => CACHED [dev_python 18/29] COPY dependencies/nomad-remote-tools-hub/tools.json ./dependencies/nomad-remote-tools-hub/tools.json                              0.0s
 => CACHED [dev_python 19/29] COPY gui/src/northTools.json ./gui/src/northTools.json                                                                            0.0s
 => CACHED [dev_python 20/29] COPY gui/src/exampleUploads.json ./gui/src/exampleUploads.json                                                                    0.0s
 => CACHED [dev_python 21/29] COPY gui/tests/nomad.yaml ./gui/tests/nomad.yaml                                                                                  0.0s
 => CACHED [dev_python 22/29] COPY gui/tests/env.js ./gui/tests/env.js                                                                                          0.0s
 => CACHED [dev_python 23/29] RUN ./scripts/generate_example_uploads.sh                                                                                         0.0s
 => CACHED [dev_node 1/8] WORKDIR /app/gui                                                                                                                      0.0s
 => CACHED [dev_node 2/8] COPY gui/yarn.lock gui/package.json ./                                                                                                0.0s
 => CACHED [dev_node 3/8] COPY gui/materia ./materia                                                                                                            0.0s
 => CACHED [dev_node 4/8] COPY gui/crystcif-parse ./crystcif-parse                                                                                              0.0s
 => CACHED [dev_node 5/8] RUN yarn --network-timeout 1200000                                                                                                    0.0s
 => CACHED [dev_node 6/8] COPY tests/states/archives/dft.json  /app/tests/states/archives/dft.json                                                              0.0s
 => CACHED [dev_node 7/8] COPY gui .                                                                                                                            0.0s
 => CACHED [dev_node 8/8] RUN yarn run build                                                                                                                    0.0s
 => CACHED [dev_python 24/29] COPY --from=dev_node /app/gui/build nomad/app/static/gui                                                                          0.0s
 => ERROR [dev_python 25/29] RUN --mount=source=.git,target=.git,type=bind pip install ".[parsing,infrastructure,dev]"                                          0.0s
 => CACHED [dev_python 26/29] RUN ./scripts/generate_docs_artifacts.sh  && mkdocs build  && mkdir -p nomad/app/static/docs  && cp -r site/* nomad/app/static/d  0.0s
 => CACHED [dev_python 27/29] RUN echo "git_describe_command = "git describe --tags --long --match \"*[0-9]*\""" >> pyproject.toml                              0.0s
 => ERROR [dev_python 28/29] RUN --mount=source=.git,target=.git,type=bind python -m build --sdist                                                              0.0s
------
 > [dev_python 25/29] RUN --mount=source=.git,target=.git,type=bind pip install ".[parsing,infrastructure,dev]":
------
------
 > [dev_python 28/29] RUN --mount=source=.git,target=.git,type=bind python -m build --sdist:
------
failed to compute cache key: "/.git" not found: not found

Appartently .dockerignore is ok so I dont know what else to look at.

I am more speculating than knowing. But docker build --pull https://github.com/some/repo might not traditionally clone the project and might not produce a .git folder leading to no .git folder being present in the docker build context?

For context:

  • the Dockerfile builds the nomad-lab python package
  • this build uses setuptools_scm
  • this uses git to produce a version string based on git tags
  • this calls git to get the necessary information
  • this requires a clone git repo with .git folder

Thank you for the suggestions !

Fortunately I was able to get a successful build with this command:

DOCKER_BUILDKIT=1 docker  build --build-arg BUILDKIT_CONTEXT_KEEP_GIT_DIR=1 --pull --no-cache --label nomad-mdforti -t check_strucuture_name    https://github.com/mdforti/nomad.git#check_structure_name

apparently docker build removes .git after cloning, and --build-arg BUILDKIT_CONTEXT_KEEP_GIT_DIR=1 keeps the repo.

Now, the build was successful although the image is not working but I will probably have a better saying after a few tests - also the build process fills my disk and that might be interfering.

the following setup gave me the expected results:

  1. modify the docker-compose.yaml app, worker and north images: replace the image entry by:
    build : https://github.com/user/repo#branch
    image: desired_image_name:desired_branch_tag

  2. build using docker compose , not docker build:
    docker compose build --build-arg BUILDKIT_CONTEXT_KEEP_GIT_DIR=1

this can take a long (~15 min in my laptop) time and requires ~ 15 - 20GB disk space for build caches and image storage. You can clean caches using docker system prune afterwards.

  1. execute docker pull and docker up -d to run nomad oasis with modifications.

hope this helps !

Thanks a lot @mdforti . I added a section about building the image to the documentation. I included you findings there. It will be available with the next release.

I am closing this issue. Numerous actions have been taken (below). This is the best we can do for now to improve the situation.

  • We updated the http://nomad-lab.eu website and it should make it easier to reach the documentation and git repositories.
  • We updated to Python 3.9. The documentation for developers as been improved.
  • The README.md is now linking to the documentation deployed in our staging environment. This documentations should be the closest possible to the default develop branch.

I am sorry for comming back to this. the instructions above are misleading. I see now that docker compose pull is not necesary on the context of the local image. so the review:

  • modify docker-compose.yaml with build/image tags as described.
  • execute docker compose build --build-arg BUILDKIT_CONTEXT_KEEP_GIT_DIR=1
  • execute docker compose up -d to bring the oasis up.

I have the impression that I am missing something yet, as the rabbit, elastic, and mongo will still need to be pulled?

Can you be more specific on what part of the documentation is misleading and what the problem is that you are consequently experiencing? Is this about this part: https://nomad-lab.eu/prod/v1/staging/docs/develop/setup.html#build-the-docker-image?

I am not sure how docker compose build is working. I guess it is only building where the docker-compose.yaml is leading to a source with Dockerfile. This might not include building rabbit, elastic, mongo and they have to be pulled?

with 'instructions above' I was referring to the instructions in this issue and not the ones in the link you posted.

But, now that you mention, there are also a couple things there.

I see you have examples using my personal fork of the nomad repo. Could you please replace it for a generic one ? this repo is for testing in my institute. I have it open only to testing proposes but I woulnt like others to pull this as it includes institution visuals etc.

Then, after the building with docker directly from command line, it is not clear to me how to modify the docker compose yaml file such that the containers work with the built image. All the steps I posted 'above' ( in this issue) work with docker compose build but not necesarily with docker build.

I will try to prepare a more clear step by step to make the procedure from scratch. A total begginer with docker (like me a few weeks ago) might find that helpful.

I see you have examples using my personal fork of the nomad repo. Could you please replace it for a generic one ? this repo is for testing in my institute. I have it open only to testing proposes but I woulnt like others to pull this as it includes institution visuals etc.

Yes of course. That was not intentional. I was to eagerly copy-n-pasting. I remove this asap.

Then, after the building with docker directly from command line, it is not clear to me how to modify the docker compose yaml file such that the containers work with the built image. All the steps I posted 'above' ( in this issue) work with docker compose build but not necesarily with docker build.

For us it is a thin line between providing the necessary information (what we need to do) and making a docker/docker-compose tutorial (what we do not want to do). It is a compromise and not alway perfect for everyone.

I totally understand.

I see you have examples using my personal fork of the nomad repo. Could you please replace it for a generic one ? this repo is for testing in my institute. I have it open only to testing proposes but I woulnt like others to pull this as it includes institution visuals etc.

Yes of course. That was not intentional. I was to eagerly copy-n-pasting. I remove this asap.

I aslo understand, no worries.