pkiraly/qa-catalogue

Allow to use Docker volume for Solr files

nichtich opened this issue · 8 comments

The Solr directory of databases (I think /opt/solr/server/solr) should be mountable when starting the solr container so the Solr index can be kept outside of Docker container.

I am investigating it. The difficulty I see here is that this directory might need some configuration files populated, e.g. the configsets directory.

Attention: files created with RUN in Dockerfile will get ignored when their directory is later exposed as volume, so the files need to be created when the container is started, not as part of the image.

Phu2 commented

Currently, all Solr data is lost when the container is stopped and removed, eg. by executing docker compose down

Phu2 commented

How about using a separate container for Solr from the official image rather than manually install it in the same container? This way it is easier to persist data on the host. The docker-compose.yml would look like:

# Used to start the base image with `docker compose up -d`

version: '2'

services:
  solr:
    image: solr:8.11.3
    ports:
      - "8983:8983"
    volumes:
      # Create directory up front with the right permissions, eg.: mkdir ./solr && chown -R 8983:8983 ./solr
      - ./solr:/var/solr
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s http://localhost:8983",
        ]
      interval: 10s
      timeout: 10s
      retries: 120
    restart: on-failure
    networks:
      - qa-catalogue-backend

  app:
    depends_on:
      solr:
        condition: service_healthy
    container_name: metadata-qa-marc
    # image: ${IMAGE:-pkiraly/metadata-qa-marc:0.7.0}
    # image: ghcr.io/pkiraly/qa-catalogue:main
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - ./${INPUT:-input}:/opt/qa-catalogue/input
      - ./${OUTPUT:-output}:/opt/qa-catalogue/output
      - ./catalogues:/opt/qa-catalogue/catalogues
      - ./${WEBBCONFIG:-web-config}:/var/www/html/qa-catalogue/config
    ports:
      - "${WEBPORT:-8000}:80"       # qa-catalogue-web
      # - "${SOLRPORT:-8983}:8983"  # Solr
    networks:
      - qa-catalogue-backend

networks:
  qa-catalogue-backend:
    name: qa-catalogue-backend
    external: true

However, the indexing scripts have to be adapted in order to talk to the Solr container (not localhost).

Using the official Solr Docker image would also decrease image sizes and thus speed up build process.

@nichtich @Phu2 Many thanks for the ideas and code! I've started to implement it. I will ping you again when it will be testable at least in a branch. I think there will be 3 docker container:

  • solr
  • the command line interface (backend)
  • the web UI (frontend) - based on a dedicated php image: php:8.1-apache)

We also need a volume that is shared between the last two containers, and we should add environment variables of the URLs of the components

@nichtich @Phu2 I run into a problem and I ask your opinion about that.
there are two methods to create a new Solr core (index):

  • use the command line tool, such as bin/solr create_core -c my_core
  • use URL (admin/cores?action=CREATE&name=my_core&instanceDir=path/to/dir&config=solrconfig.xml&dataDir=data) see documentation

QA catalogue so far utilizes the later method, but it does not specify instanceDir, config and dataDir parameters - they are the default. In s standard Solr installation the [solr base dir]/server/solr directory is the location where indices take place. It contains the individual index directories, and a preconfigured configsets directory, that contains configuration file templates. When Solr create a new core the configSet parameter could be used to specify the template, which is actually a subdirectory of the configsets (the default is called _default). The [solr base dir]/server/solr is the Solr home directory. In the Solr Dockerfile SOLR_HOME is specified as var/solr/data, an empty directory.

If we try to create a new core with the API, it thows an error message and it does not complete successfully:

SolrCore 'qa-catalogue_1' is not available due to init failure: Could not load configuration from directory /var/solr/data/configsets/_default

I can see several possible solutions:

  1. instead of using API, the tools should use the command line method. But it is a bit complicated, because inside a docker dontainer we have to call a command of another docker container, so the source should be able to run the docker command, and it should know the name of the target container's name.
  2. as a step in image creation process we should copy /opt/solr/server/solr/configsets to /var/solr/data/. The documentation says that there is a /docker-entrypoint-initdb.d directory that can be mounted from outside, and we can add something in there.
  3. we can provide configsets via the tool, and ask users to add it to the mounted directory.

The questions:

  • have you every run into this problem?
  • what solution you see as optimal; My choice would be 2), but maybe you know a better solution.