iipc/openwayback

Adding Dockerfile to build and run

ibnesayeed opened this issue · 18 comments

With the introduction of Multi-Stage Build feature in Docker, it should be very easy to write a Dockerfile that can be used to build OpenWayback from source while still producing a light-weight image for production use.

Additionally, the newly introduced feature of allowing arguments in FROM directive will make it even more friendly to build images for any combination of Java and Tomcat. This could be very handy for testing.

ldko commented

@ibnesayeed Is this something you want to assign yourself to?

Yes, I can take care of it.

@anjackson, what would be a good place to put the binaries and libraries (contents of bin and lib folders of the built tar file) inside the Docker image?

Currently, I have the following in my Dockerfile, but some binaries such as cdx-indexer wont work because I did not copy the libraries in the final image.

ARG MAVEN_TAG=latest
ARG TOMCAT_TAG=latest
ARG SKIP_TEST=false

# Building stage
FROM maven:${MAVEN_TAG} AS builder
COPY . /src
WORKDIR /src
RUN mvn package -Dmaven.test.skip=${SKIP_TEST}
RUN tar xvzf dist/target/openwayback.tar.gz -C dist/target \
    && mkdir dist/target/openwayback/ROOT \
    && cd dist/target/openwayback/ROOT \
    && jar -xvf ../*.war

# Image creation stage
FROM tomcat:${TOMCAT_TAG}
LABEL maintainer="Sawood Alam <@ibnesayeed>"

RUN rm -rf /usr/local/tomcat/webapps/*
COPY --from=builder /src/dist/target/openwayback/ROOT /usr/local/tomcat/webapps/ROOT
COPY --from=builder /src/dist/target/openwayback/bin /usr/local/bin/

VOLUME /data

ENV WAYBACK_BASEDIR=/data \
    WAYBACK_URL_SCHEME=http \
    WAYBACK_URL_HOST=localhost \
    WAYBACK_URL_PORT=8080 \
    WAYBACK_URL_PREFIX=http://localhost:8080

The fact that you can use ARGs in the FROM directive is very cool. But it breaks the Dockerfile for most commonly-used Linux distros (the one I'm using included). Should we consider an alternate Dockerfile that's compatible with older versions?

@runderwood, the ARG in FROM and the Multi-Stage Build, both features are currently only available in Docker's pre-release and should be released under version 17.05 by next month. The ARG feature would give us flexibility of rapidly building images with various combinations of different versions of Maven, JDK, Tomcat, and JRE. Additionally, the Multi-Stage Build feature allows us to generate light-weight final images, free from the build-time bloat. For example, the following command would build an image named openwayback with tag minimal where the code would be built using Maven 3.5 with JDK 7 and then the built artifacts will be packaged in a small Alpine Linux image with Tomcat 7 and JRE 7. If no --build-args are provided then the latest cached tags will be used for both the base images.

$ docker build --build-arg=MAVEN_TAG=3.5-jdk-7 --build-arg=TOMCAT_TAG=7-jre7-alpine -t openwayback:minimal .

Achieving something like this with older versions of Docker would require multiple Dockerfiles and custom scripts. That said, in my opinion, the advantage of the two features utilized here overweights the backward compatibility.

I would also note the fact that these magical features are only necessary at build time. Once the image is built, it can be pushed to a repository, then a container can be run from it using an older docker engine.

@ibnesayeed OK. Works for me. Just thought it worth asking.

Any thoughts on where to put the jars from the lib folder of build artifacts?

ldko commented

The jars needed for the cdx-indexer should already exist in the image you are building at /usr/local/tomcat/webapps/ROOT/WEB-INF/lib. With how the command line scripts in the bin directory are written, they will put *.jar files on your classpath based on them being at $WAYBACK_HOME/lib. One way you could run cdx-indexer with what you currently seem to be doing would be to set the WAYBACK_HOME environment variable to /usr/local/tomcat/webapps/ROOT/WEB-INF. I don't have a recent enough version of Docker installed to run your Dockerfile to confirm that though.

Thanks @ldko. Java is not the language I work with very often, so there are always certain pieces that I am not sure about. And I was not sure if the bin directory as it is packed inside the built artifact is part of the PATH as default, if not then I will have to set that in the container. Also, I thought the lib and bin directories were placed out side of the war file and not inside WEB-INF/lib. However, if placing them there would do the trick then it is the simplest way I can think of. I will experiment with this tonight and tell my findings.

When we build it using mvn package and extract dist/target/openwayback.tar.gz file, two lib directories are created. one is outside the webapp and one inside the webapp under WEB-INF. The one inside WEB-INF has 79 files while the one outside has only 62. Here is the comm view with common files removed while first column shows unique files in outside lib and second column shows unique files in the inner lib directory.

	antlr-2.7.5.jar
	arq-2.2.jar
	arq-extra-2.2.jar
	commons-cli-1.0.jar
commons-cli-1.2.jar
	concurrent-jena-1.3.2.jar
	foresite-0.9.jar
hadoop-ant-0.20.2-cdh3u4.pom
	icu4j-3.4.4.jar
	iri-0.5.jar
	jdom-1.0.jar
	jena-2.5.5.jar
	jenatest-2.5.5.jar
	json-jena-1.0.jar
	log4j-1.2.12.jar
log4j-1.2.17.jar
	lucene-core-2.2.0.jar
	rome-0.9.jar
stax-api-1.0.1.jar
	stax-api-1.0.jar
	wstx-asl-3.0.0.jar
	xalan-2.7.0.jar
	xercesImpl-2.7.1.jar
	xml-apis-1.0.b2.jar
	xmlParserAPIs-2.0.2.jar

This shows that outside lib directory has newer versions of commons-cli, log4j, and stax-api. I think the packaging needs some update for consistency, unless there is a reason why it is the way it is.

Thanks @ldko, setting the WAYBACK_HOME environment variable to /usr/local/tomcat/webapps/ROOT/WEB-INF did the trick. You are awesome.

If someone can install the pre-release of Docker and try building the image, that would be great. Any other reviews or comments are welcome on the PR #344 which I think is safe to merge as it does not mess with the existing system.

I will try to document its usage in the wiki later.

Docker v17.05.0-ce was released yesterday (no more a release candidate). Hence it should be easy for anyone to upgrade their Docker Engine to the latest version and test the PR #344. I personally think it is safe to merge now, but reviews are welcome.

ldko commented

Thanks for letting us know the compatible Docker has been released. I intend to test it soon--just haven't gotten to it yet.

I have added a basic Docker documentation in the wiki. Please feel free to modify it for accuracy, clarity, or expansion.

Closing this as it is implemented in PR #344 and merged.