Allow to use Docker volume for Solr files
nichtich opened this issue · 8 comments
The Solr directory of databases (I think /opt/solr/server/solr
) should be mountable when starting the solr container so the Solr index can be kept outside of Docker container.
I am investigating it. The difficulty I see here is that this directory might need some configuration files populated, e.g. the configsets
directory.
Attention: files created with RUN
in Dockerfile
will get ignored when their directory is later exposed as volume, so the files need to be created when the container is started, not as part of the image.
Currently, all Solr data is lost when the container is stopped and removed, eg. by executing docker compose down
How about using a separate container for Solr from the official image rather than manually install it in the same container? This way it is easier to persist data on the host. The docker-compose.yml
would look like:
# Used to start the base image with `docker compose up -d`
version: '2'
services:
solr:
image: solr:8.11.3
ports:
- "8983:8983"
volumes:
# Create directory up front with the right permissions, eg.: mkdir ./solr && chown -R 8983:8983 ./solr
- ./solr:/var/solr
healthcheck:
test:
[
"CMD-SHELL",
"curl -s http://localhost:8983",
]
interval: 10s
timeout: 10s
retries: 120
restart: on-failure
networks:
- qa-catalogue-backend
app:
depends_on:
solr:
condition: service_healthy
container_name: metadata-qa-marc
# image: ${IMAGE:-pkiraly/metadata-qa-marc:0.7.0}
# image: ghcr.io/pkiraly/qa-catalogue:main
build:
context: .
dockerfile: Dockerfile
volumes:
- ./${INPUT:-input}:/opt/qa-catalogue/input
- ./${OUTPUT:-output}:/opt/qa-catalogue/output
- ./catalogues:/opt/qa-catalogue/catalogues
- ./${WEBBCONFIG:-web-config}:/var/www/html/qa-catalogue/config
ports:
- "${WEBPORT:-8000}:80" # qa-catalogue-web
# - "${SOLRPORT:-8983}:8983" # Solr
networks:
- qa-catalogue-backend
networks:
qa-catalogue-backend:
name: qa-catalogue-backend
external: true
However, the indexing scripts have to be adapted in order to talk to the Solr container (not localhost).
Using the official Solr Docker image would also decrease image sizes and thus speed up build process.
@nichtich @Phu2 Many thanks for the ideas and code! I've started to implement it. I will ping you again when it will be testable at least in a branch. I think there will be 3 docker container:
- solr
- the command line interface (backend)
- the web UI (frontend) - based on a dedicated php image: php:8.1-apache)
We also need a volume that is shared between the last two containers, and we should add environment variables of the URLs of the components
@nichtich @Phu2 I run into a problem and I ask your opinion about that.
there are two methods to create a new Solr core (index):
- use the command line tool, such as
bin/solr create_core -c my_core
- use URL (
admin/cores?action=CREATE&name=my_core&instanceDir=path/to/dir&config=solrconfig.xml&dataDir=data
) see documentation
QA catalogue so far utilizes the later method, but it does not specify instanceDir
, config
and dataDir
parameters - they are the default. In s standard Solr installation the [solr base dir]/server/solr
directory is the location where indices take place. It contains the individual index directories, and a preconfigured configsets
directory, that contains configuration file templates. When Solr create a new core the configSet
parameter could be used to specify the template, which is actually a subdirectory of the configsets
(the default is called _default
). The [solr base dir]/server/solr
is the Solr home directory. In the Solr Dockerfile SOLR_HOME is specified as var/solr/data
, an empty directory.
If we try to create a new core with the API, it thows an error message and it does not complete successfully:
SolrCore 'qa-catalogue_1' is not available due to init failure: Could not load configuration from directory /var/solr/data/configsets/_default
I can see several possible solutions:
- instead of using API, the tools should use the command line method. But it is a bit complicated, because inside a docker dontainer we have to call a command of another docker container, so the source should be able to run the
docker
command, and it should know the name of the target container's name. - as a step in image creation process we should copy
/opt/solr/server/solr/configsets
to/var/solr/data/
. The documentation says that there is a/docker-entrypoint-initdb.d
directory that can be mounted from outside, and we can add something in there. - we can provide configsets via the tool, and ask users to add it to the mounted directory.
The questions:
- have you every run into this problem?
- what solution you see as optimal; My choice would be 2), but maybe you know a better solution.
Just for later reference here are some links:
- Solr docker image: https://hub.docker.com/_/solr
- documentation: https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html
- CoreAdmin API: https://solr.apache.org/guide/solr/latest/configuration-guide/coreadmin-api.html
- Dockerfile: https://github.com/apache/solr-docker/blob/main/9.6/Dockerfile