Add docker volume for scripts under ./catalogues
Phu2 opened this issue · 9 comments
Currently you cannot use the provided scripts (or some own scripts) under the ./catalogues
path with Docker.
In docker-compose.yml
add
volumes:
- ./catalogues:/opt/qa-catalogue/catalogues
Usage eg.:
docker exec -it metadata-qa-marc ./catalogues/test.sh all
@Phu2 I think there are two distinct problem here:
- the Docker instance (and the release.zip) does not contain the catalogue script
- how to run a custom script
I would like to handle them distinctly. The catalogues/ directory should be part of the zipped version and thus the docker container. But I do not think that the users who would like to use docker should have the full source code available in the host. So I suggest the following:
- instead putting your own script to catalogues, put it into some other directory, such as
custom-script
-- otherwise if the localcatalogues
contain only your script it would overwrite the already existing directory within the container - mount this local directory with volume
- use it as
docker exec -it metadata-qa-marc ./custom-script/test.sh all
Do you agree with this plan?
I'd prefer not using the ./catalogue
scripts for your own catalogues because most of these scripts could better be replaced by config files and ./qa-catalogue
should be the main invocation point. Having multiple ways to invoce the application is confusing. At the moment you can
- call
qa-catalogue
with options and arguments - call
qa-catalogue
with environment variables - create a script that bypasses
qa-catalogue
but calls./common-script
Calling common-script
should be discouraged at at as this is an internal script that may change in the future.
By now only method 1 works with Docker. For method 2 calling docker exec with --env-file
should work. We could extend docker/qa-catalogue
to pass environment variables this way. Eventually the env-file can act as config file. Another method is to create a script that calls docker/qa-catalogue
with your configuration.
@nichtich I agree with the principle to have a single point of entry. Does it imply that the scripts in catalogues
should be rewriten to use qa-catalogue
? I am sure we should have scripts, because these records the catalogue specific settings. If I understand correctly the env-file should be on the host, not on the container.
Take this example:
docker exec -it metadata-qa-marc --env-file catalogues/gent.sh ./qa-catalogue all
It will not work, if we have only the docker container, but not the source code.
Another problem I just recognize: docker compose up -d
does not need any downloaded files (and this is one of the reason to use docker), however ./docker/qa-catalogue
needs at least this file to be downloaded. However we do not need any more file, and the docker/
part require to create docker dir, then download the script there - I think this step is unnecessary. It should be clear in the README, such as
# download docker wrapper script
curl https://github.com/pkiraly/qa-catalogue/releases/download/v0.8.0/docker-qa-catalogue \
--output qa-catalogue
# run the analyses
./qa-catalogue \
--params "--marcVersion GENT --alephseq" \
--mask "rug01.export" \
--catalogue gent \
all
supposing the ./docker/qa-catalogue
file is part of the release file list as docker-qa-catalogue
.
My conclusions
- we should have "canonical" catalogue specific scripts
- we also should provide a way to create custom catalogue specific scripts
- these scripts should work both inside and outside docker
- I agree to not use
common-script
directly - we should keep in mind two use cases:
a. the user has the QA catalogue as it is available in the release zip file
b. the user uses only docker and if necessary a minimal number of other files (such as the docker wrapper script)
- we should have "canonical" catalogue specific scripts
I see them more as examples how to call qa-catalogue for known data sources.
- we also should provide a way to create custom catalogue specific scripts
We should not provide a way. People can call ./qa-catalogue
or ./docker/qa-catalogue
however they like. Sure it can be called via another bash script but also via Perl, Jenkins... The recommended way should be to directly call it.
- to 5.
Agreed!
One solution would be:
- remove
. ./setdir.sh
and. ./common-script
lines from the catalogues/*.sh scripts - modify
qa-catalogue
script:- add a new parameter:
-f
,--config-file
- if this parameter is set and the referenced file is existing: run it (before displaying the variables)
- add a new parameter:
- run it with
./qa-catalogue --config-file catalogues/gent.sh all
I tested and it works. Moreover it would work inside docker as well if the script is available inside (which can be achieved by applying the suggested volume
parameter).
First i would like to give some context for this issue. As a first time user of QA catalogue i am trying to figure out how to use this tool. After scanning through the README i decided to give Docker a try. My expectation was that everything would work the same way with Docker. The Usage section doesn't differentiate between the two environments. In fact, the many ways to invoke the tool confuses me a lot. The scripts within the catalogs
directory seemed to be a good start. To my surprise they are not available with Docker. That's why i created this issue. Sorry for the long read :)
./qa-catalogue
as the single point of entry + config file sounds very good to me 👍
You could create a config
directory which is mounted into the container by default. All catalogues/*.sh
scripts can be replaced by config files, right?
I've renamed --config-file
to --env-file
because the same name is use in docker compose. It works with both ./qa-catalogue
and ./docker/qa-catalogue
. It is also possible to use the same config file to start the Docker container and to run qa-catalogue. I've also removed support of setdir.sh
is ./qa-catalogue
to have one preferred type of configuration files.
Fine for me.
The following environment variables are set in any of the catalogue files:
$ grep -oh '^[A-Z_]\+' catalogues/*.sh | sort | uniq -c
19 MARC_DIR
45 MASK
45 NAME
2 SCHEMA
1 TYPE
54 TYPE_PARAMS
1 VERSION
Most files can directly be converted to an .env
file by removal of calling setdir.sh
and common-script.sh
. Possible exceptions are dnb.sh
, k10plus_pica.sh
, k10plus_pica_grouped.sh
. In addition every MARC_DIR=${BASE_INPUT_DIR}/xyz
must be changed to INPUT_DIR=xyz
.