Remove usage of the `VOLUME` instruction

The VOLUME instruction should be removed from the image build.

Containers already persist state internally until the container is destroyed (for example, an image upgraded to a new tag).
Proper persistence externally should be explicit.

Lines 183 to 186 in a537d60

    
           ENV PGDATA /var/lib/postgresql/data 
        
           # this 1777 will be replaced by 0700 at runtime (allows semi-arbitrary "--user" values) 
        
           RUN install --verbose --directory --owner postgres --group postgres --mode 1777 "$PGDATA" 
        
           VOLUME /var/lib/postgresql/data

Anonymous volumes are created when an image has a VOLUME instruction. They will initialize by copying any of the existing content at the mount point (unlike a bind mount volume which typically replaces content at the mount point):

With docker run --rm ..., each container instance started will create a new anonymous volume. Without the --rm, these will accumulate pointlessly while the user may not be aware of this implicit waste on their system.
Docker Compose behaves differently, with additional logic to preserve the same anonymous volume across instances of the container for a given compose project.

The VOLUME instruction provides no value to an image. It only causes problems.

For detailed justification, please see below context for my previous write-up on this subject:

kanidm/kanidm#2948
ory/hydra#3683
caddyserver/caddy-docker#118 (comment)
Example of implicit 2GB disk usage per container instance (not applicable to images with empty VOLUME published, but clearly documents a VOLUME concern)

Additional context (`docker-library` specific)

This issue is being raised across docker-library repos (with the only change the Dockerfile referenced snippet to change). It is similar to the past (stale?) docker-library issues for mongo, postgres, mysql, redis.

It is unclear why those older related issues on this topic have remained open for 6-8 years instead of resolving them. If a maintainer could clarify any pragmatic reason to keep VOLUME as a blocker, that would be appreciated. So far the only reason I recall is with Docker Compose, but that seems like a weak argument when persistence actually matters it should be explicit.

If the length (plus varying discussion chains) and age of those issues contribute to the lack of resolution, the issue I raise here hopefully condenses justification for removal into a consistent an easier to digest manner to drive this change forward.

As these are images maintained officially under the docker-library umbrella, it would help endorse VOLUME as an anti-pattern instead of encourage it's usage.

It's not clear to me what problems these implicit volumes other than needing to occasionally running a prune command to clear dangling volumes.

As for it being an anti-pattern, I think it's quite the opposite. When considering usability and developer experience and given the choice between potential loss of data or leaving dangling objects, the choice is clearly with not losing data. The loss of data is what needs to be explicit. This behavior is also consistent with other dev tools like git which keeps dangling objects and require explicit pruning.

Removing these volumes would lead to a degraded experience for developer who are new to containers while experienced one already know how to handle them.

TL;DR: Apologies for the verbosity..

Reasons to be cautious about VOLUME:

Introduces inconsistency in behaviour for users (differences across Docker, Compose, Kubernetes - as opposed to explicit named volumes or bind mounts)
Bugs (when a fresh container instance uses an anonymous volume created for a destroyed container)
Risks leaking sensitive data (when implicitly used with other images)
Not friendly (see the problem scenarios below) to inexperienced users, yet enforced upon them.. (Podman offers opt-out)
Inexperienced users may assume all images implicitly persist with VOLUME for consistency, while experienced users must take extra precaution when intentionally not wanting implicit persistence due to the potential usage of VOLUME.

The postgres image isn't affected by any of the major caveats AFAIK, but I'd still discourage VOLUME usage as I've explained below in response to your statements.

It's not clear to me what problems these implicit volumes other than needing to occasionally running a prune command to clear dangling volumes.

Sorry if I wasn't clear about this. It is not only the unwanted accumulation of anonymous volumes (and the difficulty to selectively remove them when the container no longer exists).

If we consider how common Docker Compose is for example, consider that VOLUME:

Is tied to the service name, not the image. It'll persist across image upgrades, or image changes (even images that are entirely different but have the same VOLUME path defined). This also risks leaking sensitive data, that unless the user was even aware of the implicit persistence a bad actor could exploit.
Destroying containers isn't sufficient to replace an anonymous volume with an empty one for a new container instance (like it is with Docker CLI). This adds friction to getting a clean slate where the container reflects the immutable image, thus bugs / errors can happen and it may not be immediately obvious why.
If the user is unaware of repeated volume accumulation and each anonymous volume were of a decent size, in some environments exhausting disk space can happen and cause much wider spread damage which in some cases prevents the ability to prune.

So 💯 you can workaround and manage to avoid these concerns when you know better. Users are often learning how to though because they run into one of these bad outcomes that VOLUME can cause. My argument is that users shouldn't need to in the first place, just because an image maintainer thought using VOLUME to avoid being explicit with persistence was a good idea (I don't blame them, given that official docs from 2014 still endorse it, but I'm pretty sure it's outdated advice).

Persistence of important data should be explicit with containers, they're meant to be treated as immutable for a reason. Not doing so introduces inconsistency and potential for problems I've shared here (not that postgres image is affected by any, this is more of a PSA to discourage usage of VOLUME as it's effectively legacy from 2014 prior to better persistence options arriving).

NOTE: One project is wary of dropping VOLUME from their image, even though they agree in hindsight that it's the right thing to do, they are concerned about the impact it would have on existing users losing their data, when those users relied on Docker Compose behaviour. Fortunately, all those users would need to do is explicitly configure the same anonymous volume path on that service to restore it (unless they had automated prune or similar action delete it between image upgrade and time spent troubleshooting to learn how to recover it).

When considering usability and developer experience and given the choice between potential loss of data or leaving dangling objects, the choice is clearly with not losing data.
The loss of data is what needs to be explicit. This behavior is also consistent with other dev tools like git which keeps dangling objects and require explicit pruning.

Consistency?:

docker run --rm image-name anonymous volume is removed afterwards.
docker compose run --rm service-name anonymous volume is removed afterwards.
docker compose up (_with/without --force-recreate) + docker compose down
- Volume remains no equivalent to remove anonymous volumes when their container is destroyed.
- Requires docker volume prune to blindly remove detached anonymous volumes, sometimes prune -a since identification of anonymous volume without the container isn't possible. If other anonymous volume is valued to persist this risks data loss.
Podman has support to alter the behaviour to match docker (Default) by having image volumes treated as bind mounts, but they can be changed to tmpfs or ignore (my preference). Docker lacks such despite requests for years to support that.
Kubernetes has some other issues with image volumes that causes different behaviours to explicit volumes, introducing bugs.

You know what is consistent? No VOLUME usage.

State can persist in a container without any VOLUME, and it will remain until the container is destroyed. Where is your explicit difference here vs VOLUME which is implicitly choosing for the user by the image author what should be persisted, even if the user does not want that? (what options does the user have to avoid creating anonymous volumes beyond providing explicit ones?)

Destroying a container is an explicit action to clear the container state and start with a fresh instance from the image. Useful for troubleshooting or learning with disposable container instances.

The expectation is generally if you have an image to make a container from, that container is immutable when created and thus predictable.
This is not the case when you have VOLUME. The user now needs to be aware that the image the container is instanced from has such implicitly, as depending on the container engine the behaviour is inconsistent.

Removing these volumes would lead to a degraded experience for developer who are new to containers while experienced one already know how to handle them.

I strongly disagree with you here. Someone new to containers should learn early on that explicit persistence should be preferred, especially if they're not familiar with VOLUME behaviour yet and how to properly manage that, if anything that will complicate their learning experience vs the understanding that an image is immutable and containers started should reflect that, not carry state implicitly across container instances.

I demonstrate that very clearly here where I was trying to use that projects container, but due to it's VOLUME usage, when I wanted to "reset" to clear my initial config experimentation:

My next "clean slate" was failing with errors because of an inconsistency (Docker Compose persisted that anonymous volume).
I am experienced with Docker and it's CLI, I knew about VOLUME and it's documented removal rules, and I know how to create a fresh container instance with Docker Compose. But I didn't know about it's up specific option --renew-anon-volumes (which was broken since V2 release until Sep 2024).

That'd all be avoided if there was no VOLUME. It'd have saved me quite a bit of time, I was intentionally not using a volume mount because I wanted a disposable/ephemeral container without any state persistence, normally I can rely on that... except when VOLUME is in play, which adds a layer of friction.

Please explain how VOLUME contributed positively to my experience, that my expectations and intent were not valid, and how a new inexperienced user would be better off in this situation compared to the omission of VOLUME?

What does the implicit anonymous volume via VOLUME accomplish, when the user cannot rely on that to always be used consistently across images they run? Are you wanting to encourage the belief that the user shouldn't have to think about explicit volumes for persistence, and then this inexperienced user ends up running an image with valuable data lost because "well postgres image persisted my data, why wouldn't this image?"...?

How about backups? If the volume itself rather than anything to do within a specific container is considered, is the implicit volume helpful vs a named volume or a bind mount? The container needs to be inspected (if not destroyed yet) to look at it's volume metadata to get the hexadecimal volume ID, it's definitely more friction for the minor convenience of avoiding explicit volumes when persistence matters.

If the above isn't enough to consider as a degraded experience, consider that some images will populate VOLUME in the image and the impact that can have. I cited a 2GB disk usage per container instancing new anonymous volumes as an example of not only accumulated disk waste, but startup performance from the copy delay. Such a cause is wasteful and can cause larger problems if not careful (_on Windows with WSL2 for example, if this behaviour exhausts disk space, even when the user has deleted something on the host system, Docker and all WSL2 activity is halted until system restart, data loss ensues.

Due to the WSL2 freeze from the full disk scenario, while the host itself can still be responsive, you'd not be able to do docker volume prune from WSL2 terminal or Docker Desktop (which relies upon WSL2).

A third example was Docker Compose where it will treat both implicit and explicit anonymous volumes for the same service as all the same. I point out how the official Mongo and Redis images share VOLUME /data as an example, and Docker docs directing users to Compose example configs with a common db service name, but different DB images across the examples...

Is it not a degraded experience when this overlap could very well have the different images interact with foreign data unintentionally by the user? Is VOLUME honestly doing good for the user here? How easy will it be for that inexperienced user to realize what's gone wrong because they switched from Redis to Valkey image and some subtle difference in behaviour or compatibility occurs?

What about sensitive information that is unknowningly persisted via VOLUME and this image change occurs with a common /data volume that the next image unbeknownst to the user exposes access to the sensitive data? Is it a positive experience instead of degraded? Is it good for the inexperienced user? Changing an image from one software to another, under a common service name with implicit persistence? Could that be abused by bad actors more easily than explicit persistence?

@polarathene anonymous volumes and the VOLUMES directive are not legacy and removing them from our image is not a viable options.

All these cases of these edge cases do not amount to a strong enough case to remove the feature altogether. Consider this simple example:

services:
  db:
    image: postgres:17-alpine
    environment:
      POSTGRES_PASSWORD: test

  client:
    image: alpine/psql
    command: -h db -U postgres -c "\\dn"
    environment:
      PGPASSWORD: test
    restart: on-failure
    depends_on:
      db:
        condition: service_started

Running docker compose up creates a new anonymous volume for the db. I can then start and stop the db service and the data is persisted. If I run docker compose down the anonymous volume is preserved so the user doesn't lose data if they didn't think of creating the explicit volume. Running docker-compose up again create a new volume and the user still has access to the data in the other volume. On the other hand, if the user doesn't want to keep the volume, they can simply run docker compose down -v which will remove the volumes.

Consider this slightly more complex scenario.

services:
  db:
    image: postgres:17.2-alpine
    environment:
      POSTGRES_PASSWORD: test

  client:
    image: alpine/psql
    command: -h db -U postgres -c "\\dn"
    environment:
      PGPASSWORD: test
    restart: on-failure
    depends_on:
      db:
        condition: service_started

The user run docker-compose up, an anonymous volume is created. Then the user decide they want to update the db version (to resolve a CVE or just to see the impact). They apply this change:

@@ -1,6 +1,6 @@
 services:
   db:
-    image: postgres:17.2-alpine
+    image: postgres:17.3-alpine
     environment:
       POSTGRES_PASSWORD: test

Running docker compose up reuses the volume with the new container which is the desired behavior.

This part is critical. When we talk about the immutability of containers, the data and/or state is usually excluded because it needs to persist across containers when the base container changes.

I also just tried in the same docker compose file to change the image from postgres to mongo and keeping the service name the same and it created a new volume, it didn't reuse the existing one.

@polarathene anonymous volumes and the VOLUMES directive are not legacy and removing them from our image is not a viable options.

The Docker docs "best practices" page for image builds has a section on VOLUME that's unchanged since introduction in 2014:

You should use the VOLUME instruction to expose any database storage area, configuration storage, or files and folders created by your Docker container. You are strongly encouraged to use VOLUME for any combination of mutable or user-serviceable parts of your image.

Back then I don't think there was named volume or bind mount support, so please keep that context in mind. The default storage driver for the container's filesystem layer also was much worse, other parts of the docs cite VOLUME (or volumes in general) for performance and persistence. Performance in many cases should not be significant difference these days, while persistence honestly should be explicit (as I've harped on about above).

Considering you cannot rely on all images to have VOLUME consistently and the various caveats that can occur... there's not a lot going for it. VOLUME has little value today, yes you can use it the way you are describing for relying on implicit behaviour, but I have no idea why you choose to endorse that over explicit persistence. Just because you still can, doesn't mean you should 😅

IMO VOLUME is an anti-pattern / legacy. If you disagree, no worries. I'm just seeking to get the related issues that have been left open for 6-8 years resolved.

If I run docker compose down the anonymous volume is preserved so the user doesn't lose data if they didn't think of creating the explicit volume.

docker compose down is meant to bring down everything, containers are destroyed. If they just want to stop containers they can do so with docker compose stop (or ctrl + c for SIGINT if they have TTY attached).

Consider without compose for a moment what that means. You're describing not only VOLUME but a Docker Compose specific feature (which I'm not sure if the primary maintainer of Compose even agrees with anymore, but cannot drop it as it'd be a breaking change).

Why you want to endorse hand holding for a user with implicit persistence when containers are meant to be ephemeral (try the same without Compose) and immutable when created, I really don't know. You don't need VOLUME for state to persist in a container, but it will be lost when the container is destroyed (eg: replaced by a new image, docker compose down, or explicitly forcing container recreation_)... and that's okay... that's why we have explicit persistence in the first place. VOLUME goes against that expectation.

On the other hand, if the user doesn't want to keep the volume, they can simply run docker compose down -v which will remove the volumes.

Ok great, so apart from the inconsistent with Docker CLI, Docker Compose supports workarounds for VOLUME with:

docker compose up --renew-anon-volumes (replaces the attached anonymous volume with a new one, old one remains detached needing docker volume prune -a)
docker compose run --rm (similar to Docker CLI, the anonymous volume will be removed when the container is destroyed upon stopping)
docker compose down --volumes (the anonymous volume is removed when destroying the container explicitly)

Now when a user runs into one of those problems I've cited with VOLUME, once they've actually troubleshooted it to the point that they know it's due to an implicit volume, they'll then need to know about these extra options that vary by command and use them extensively if they are concerned about avoiding such mishaps again in future due to a decision the image maintainer enforced upon them (there is no way to extend the Dockerfile to remove the VOLUME IIRC, no an option to ignore implicit volume creation at runtime with the container).

Consider this slightly more complex scenario.
Running docker compose up reuses the volume with the new container which is the desired behavior.

I have already pointed out VOLUME with Compose will persist across image upgrades as it is tied to the compose.yaml defined service (there is no magic there to tie it to the same image with a different tag).

If VOLUME never existed, what would your stance be when a user complained that the data persisted in the container but was lost when they upgraded to a new image release tag? Would you be like "you should explicitly add a volume in your compose.yaml when you want to ensure your data persists"? It'd make sense right?

This part is critical. When we talk about the immutability of containers, the data and/or state is usually excluded because it needs to persist across containers when the base container changes.

Again, the distinction should be with explicit persistence. We only have the implicit persistence due to VOLUME being a necessity when no better option existed at the time AFAIK, and the container filesystem layer performed poorly with earlier storage drivers too.

With Compose, you'd need docker compose up --force-recreate to ensure a fresh container instance, and normally that should be enough, it doesn't restore a previously stopped container which can persist modifications in it's filesystem layer. VOLUME doesn't fall into that setting though, the user must know and use a separate one when that instruction is present in the image, and even then it just makes a new volume (and potential copy if seeded).

In the end:

You value VOLUME because users can rely on it implicitly, even though it's wiser to use explicit volumes. Despite the fact that users cannot implicitly rely upon all image authors to implement it, thus they'll be prone to data loss at some point.
I oppose VOLUME because of inconsistencies and caveats that I've personally run into and discovered while troubleshooting user reports on various projects.

It doesn't seem like either of us will change our opinions on the matter. As I've mentioned, postgres isn't as badly affected by usage of VOLUME that other images have been, so no worries if you choose to keep it.

I also just tried in the same docker compose file to change the image from postgres to mongo and keeping the service name the same and it created a new volume, it didn't reuse the existing one.

I think you missed the part where I state the images must share the same common anonymous volume path (either explicit or VOLUME, in addition to using the same service name in compose.yaml).

Please refer to this reproduction to see it in action. Since postgres uses VOLUME /var/lib/postgres your image is far less likely to clash (MySQL and MariaDB both share /var/lib/mysql, while MongoDB, Redis, and Valkey all use /data... which would be more likely to clash with various images).

VOLUME is an explicit, machine-readable bit of metadata from the image author to image users saying "if you value your data, this is a/the directory you should save somewhere safe/persistent"

I agree there should be more ways to surface this metadata and to control this behavior, such as a way to disable the creation of automatic (anonymous) volumes from it, but I see that as mostly unrelated to image authors specifying VOLUME paths appropriately, and I fail to see the potential harms outweigh the benefit.

It is on our TODO list to document our (DOI) stance on this better, but we are extremely unlikely to remove VOLUME instructions from any of the images we maintain.

TL;DR: While I strongly disagree about VOLUME doing more good than harm, I appreciate the time you've both taken to respond and I want to avoid draining that further 😓

Feel free to close in favour of pursuing runtime opt-out support.

I fail to see the potential harms outweigh the benefit.

I think the emphasis is heavily on the potential for it, as the risks can be quite damaging.

Since VOLUME is often cited as intended for less experienced users benefit to avoid explicit persistence, it's an odd choice given that type of user would be most prone to encountering these mishaps.

A compose service persisting data implicitly when an image is changed risks bugs and secrets exposure (users most likely vulnerable to this will be the ones unaware of VOLUME behaviour, switching a services image to another publishers and then misconfiguring the container, common with port publishing and PHP from reports I've seen).
User must know about VOLUME usage or always rely upon the appropriate flag (--rm / --volumes / --renew-anon-volumes) as --force-recreate cannot be relied upon alone for a clean slate when a container fails to start due to bad data in the anonymous volume.
Implicit disk usage that can take out all WSL2 instances (and Docker Desktop itself) if disk space is exhausted. Or negatively impact other parts of the OS when there is no disk space left to write on the host.

While the disk usage concern isn't specific to VOLUME, it can be unexpected vs pulling an image which is more explicit on the impact of disk usage. In WSL2's case I've also had a Go container build that used 3GB+ RAM which WSL2 also allocated to disk usage despite not being under memory pressure. VOLUME usage where the image or entrypoint seeds an anonymous volume implicitly with enough data for that to be a concern is perhaps uncommon, but I've been an unexpected victim of such in the past (10GB disk left prior).

So it seems to be more the case of hand waving those concerns away as niche, in favor of avoiding:

docker run -v /var/lib/postgres

volumes:
  - /var/lib/postgres

What do these valid scenarios to default to implicit anonymous volumes look like? (with Compose since that's where the magic is happening, docker run doesn't support such AFAIK):

Saving some keystrokes/clicks and reading what the volume path is for some image the user plans to deploy long enough with data they don't want to lose by relying on implicit persistence that may or may not exist for an image 🤔 (doesn't seem very pragmatic?)
Someone assessing a new image could just as quickly evaluate without any volume involved, then once they're confident with the image and it's configuration, provide an explicit volume. Presently the DX would require knowing the appropriate flag for the command to replace the anonymous volume being relied upon, or some image specific supported interaction (command / UI action) to manage the data.

Evaluating an image and experimenting before going back to a clean slate is a more pragmatic scenario that VOLUME only adds friction to with the DX. I'm sure there's pragmatic scenario where you would endorse implicit volumes for important data vs explicit?

I'm just blanking at when a user would want to intentionally do so other than image authors handholding until the user loses data on an image without VOLUME from that experience (is that the users fault, or the image without VOLUME? What happens if a breaking change happens where the volume path needs to change?)

VOLUME is an explicit, machine-readable bit of metadata from the image author to image users saying "if you value your data, this is a/the directory you should save somewhere safe/persistent"

I am all for it to be more like EXPOSE where it's informative only by default.

There's support in tooling to publish ports from EXPOSE IIRC. Similar functionality for VOLUME would be better than the current scenario but I assume it's too late for such a change.

Generally images do well to document examples of the volume mounts for persistence, but like images that add non-root user this information isn't always documented (nor can USER be used for such AFAIK when the entrypoint needs to run as root before switching).

My impression with VOLUME is the user doesn't have to think about persistence or grok what anonymous volumes are. Many users I'm sure just use the image and don't give any thought to the possibility of VOLUME being defined for the image. Only explicitly adding a volume when they actual want to think / care about persistence.

Yet image maintainers often express the value of VOLUME, implying it's relied on by their users quite a bit to persist data (somewhat unnerving since it's often in relation to a feature specific to Compose that doesn't need VOLUME to work).

All VOLUME is doing here is teaching bad habits to rely on image authors implicitly handling data persistence for them? That doesn't benefit them when they use images that fail to do that 🤷‍♂ It just seems like a rather bad DX (great while everything is working fine, until you run into the problems that wouldn't have otherwise happened without it, quickly learning to be explicit when you consistently want persistence).

I agree there should be more ways to surface this metadata and to control this behavior, such as a way to disable the creation of automatic (anonymous) volumes from it

It is on our TODO list to document our (DOI) stance on this better, but we are extremely unlikely to remove VOLUME instructions from any of the images we maintain.

No worries, thanks for the response! 😁

I just needed to sanity check if there was any actual valid reason for VOLUME in images beyond implicit convenience, as it honestly seems like an anti-pattern rather than a best practice (far more cons than pros today, and I think we all can agree explicit volumes should be encouraged instead).

When closing the long-standing issues on this topic across the docker-library org, it'd be good if each issue could link to a comment / discussion for reference on the decision (since removing VOLUME won't be happening), or I assume if daemon/runtime support to opt-out lands that'd be sufficient too 💪

Closing in favor of anonymous volume opt-out in Docker (moby/moby#43190). VOLUME is the way for image authors to designate which directories should be persisted when users want persistence.

	ENV PGDATA /var/lib/postgresql/data
	# this 1777 will be replaced by 0700 at runtime (allows semi-arbitrary "--user" values)
	RUN install --verbose --directory --owner postgres --group postgres --mode 1777 "$PGDATA"
	VOLUME /var/lib/postgresql/data

Additional context (docker-library specific)

Additional context (`docker-library` specific)