arXiv/arxiv-base

Improve Docker image for efficiency and size

ibnesayeed opened this issue · 10 comments

I suggest the following changes to improve the Dockerfile of this repo:

  • Remove /tmp/safe_yum.sh file after it is executed
  • Move environment variable declarations and other commands that do not change often above frequently changing layers for better caching
  • Combine multiple ENV declarations in one to minimize layers
  • Include a .dockerignore file to reduce the size of the context
  • Additionally, add some metadata as labels (not for performance)
  • Also, if Dockerfile-with-git is still needed, rework it to minimize layers and make them atomic.

Moreover, other repositories such as arxiv-zero can also benefit from some rework such as reordering/combining layers, replacing ADD with COPY, and removing redundant declarations such as environment variables that are declared in this base image already.

I'll add to this:

  • Since arxiv-base package requires mysql, we should add mysql mysql-devel to the yum install step.
  • For images that extend arxiv/base (e.g. for a particular service), we should start putting pip-related commands on a single step, ending with && rm -rf ~/.cache/pip. For example:
RUN pip install -U pip pipenv uwsgi \
    && pipenv install \
    && rm -rf ~/.cache/pip

Since arxiv-base package requires mysql, we should add mysql mysql-devel to the yum install step.

Shouldn't we keep MySQL in a separate container (as a microservice), utilizing the official image?

Since arxiv-base package requires mysql, we should add mysql mysql-devel to the yum install step.

Shouldn't we keep MySQL in a separate container (as a microservice), utilizing the official image?

IIRC, the problem is that the mysqlclient Python package expects at least the MySQL client to be installed locally

IIRC, the problem is that the mysqlclient Python package expects at least the MySQL client to be installed locally

It looks like the package requires Python and MySQL development headers and libraries, but not the client itself (unless I missed something).

It looks like the package requires Python and MySQL development headers and libraries, but not the client itself (unless I missed something).

Yeah, you're right. So since we're on Centos7, should be just mariadb-devel I think?

... and python3-devel too I guess?

EDIT: I realized it is already added!

Want to take a look at #139 and see what you think so far?

Looks like it is taking a good shape. I have posted a couple minor comments/suggestions there.

I have checked the check boxes in the first post as per #139. Since this is a major rework on the base image, Dockerfiles of other derived images in various other repos should be improved/adjusted accordingly and tested well before making it a go!

bdc34 commented

Thank you. Closing issue since this work was merged several years ago.