moby/moby

docker kill leaves directories behind.

simonjohansson opened this issue · 21 comments

Doing a docker kill UUID have left some directories behind in /var/lib/docker/containers
Doing a ls reveals

$ls /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/
ls: cannot access /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/rootfs: Stale NFS file handle
rootfs

I am running docker inside a rather slow virtualbox-vm (Ubuntu 12.04, 3.5.0-23-generic). I have right now 7 of these directories, two of them comes from containers where I have made big changes(apt-get update), the other five have only been "echo hello world"-containers.

Relevant IRC-chat

23:11 < DinMamma> Ah, this is interesting, when looking into the cointaners in /var/lib/docker/containers I get "ls: cannot access 
                  rootfs: Stale NFS file handle"
23:11 < DinMamma> So I wonder if this is a issue with my system rather than docker.
23:11 <@shykes> DinMamma: no, this is a known issue with aufs, which we thought we had neutralized
23:12 <@shykes> basically aufs umount is asynchronous
23:12 <@shykes> it does background cleanup
23:12 <@shykes> if you remove the mountpoint too quickly before aufs is done with cleanup, it gets stuck
23:12 <@shykes> and you get that error message
23:13 < DinMamma> I should say that I am running my tests inside a rather slow virtualbox-vm.
23:13 <@shykes> I'm surprised that you hit this. We have a workaround which includes checking the stat() on the mountpoint in a loop, 
                until its inode changes
23:19 <@shykes> DinMamma: so am I :)
23:19 <@shykes> mmm that could be it
23:20 <@shykes> DinMamma: did one of these containers have a lot of filesystem changes on them?
23:20 <@shykes> like a big apt-get, or something like that?
23:20 < DinMamma> Yep
23:20 < DinMamma> Two of them.
23:20 <@shykes> maybe slow machine + lots of data on the aufs rw layer means -> our workaround timed out, and gave up waiting for aufs

Just an extra comment: it is normal for 'docker kill' to leave the container directory. By default all containers are stored, so you can inspect their filesystem state, commit them into images, restart them etc.

But of course it is not normal to see "stale NFS handle" errors :)

I can't reproduce.

My host is ubuntu12.10 and I used the base as guest.
Anybody can reproduce ?

Is there a way to manually repair the directory so I can delete the directories without rebooting the host?

Not that I know of. Note that there is no known side-effect outside the
scope of that container.

On Monday, April 15, 2013, Thomas Hansen wrote:

Is there a way to manually repair the directory so I can delete the
directories without rebooting the host?


Reply to this email directly or view it on GitHubhttps://github.com//issues/197#issuecomment-16385541
.

As discussed earlier, this is probably due to the asynchronous nature of aufs unmount.

I'm downgrading this to minor bug, since:

a) it occurs very rarely (1 known occurrence so far)
b) it has no impact on the behavior of docker or the system,
c) it's very hard to reproduce

+1 on a fix for this since i just bumped into it:

~# docker rm 5cbb64c3279a
Error: Error destroying container 5cbb64c3279a: stat /var/lib/docker/containers/5cbb64c3279a76acaac4769e4a6c57c39a7fff6027b51d14ecff08040d252d13/rootfs: stale NFS file handle

@simonjohansson Since #816, did you get the error?

Hi guys, sorry I didn't see this until now. I have some holiday coming up in the next couple of days, Ill make sure to see if #816 fixed the issue!

Just encountered the same issue:

root@dscape:~# docker ps -a | grep 'Exit' |  awk '{print $1}' | xargs docker rm
Error: Error destroying container 38b561af34e1: stat /var/lib/docker/containers/38b561af34e1bb0b3e92d7b1fe734aeabf223d6a5c36757be8925514e28e8b45/rootfs: stale NFS file handle

Error: Error destroying container 112a0c0b9c95: stat /var/lib/docker/containers/112a0c0b9c9546697f20dd7ed21899b789f981eb5195d189b1503ab1893184e4/rootfs: stale NFS file handle

Error: Error destroying container ef13c73b64a9: stat /var/lib/docker/containers/ef13c73b64a991e2b937fbcb1fae412d7b6404dcb67ae105c06ebd5b62926f35/rootfs: stale NFS file handle

Error: Error destroying container e0178615f6d8: stat /var/lib/docker/containers/e0178615f6d8be7ca343c89c398536713542413fa7ac04d172bb268f626a252a/rootfs: stale NFS file handle

Error: Error destroying container 3c8659a041c9: stat /var/lib/docker/containers/3c8659a041c9217e35c056e96da0fe5dc9d5eae43f37874ff372190ed8867277/rootfs: stale NFS file handle

Error: Error destroying container 99dee8e5a486: stat /var/lib/docker/containers/99dee8e5a486b8eeff3855e6750e1dee90ec4c8af022ed9a43304edda411b507/rootfs: stale NFS file handle

Error: Error destroying container b7ac0d3f3f79: stat /var/lib/docker/containers/b7ac0d3f3f79ae35883d09e796332726322e56bdd715e5484210bf84099cc513/rootfs: stale NFS file handle

Error: Error destroying container 7329c9be9795: stat /var/lib/docker/containers/7329c9be97957b187cdb6cbb825ab506e3a8610c01b4055ad5cc64fc58a6e985/rootfs: stale NFS file handle
root@dscape:~# docker version
Client version: 0.4.8
Server version: 0.4.8
Git commit: ??
Go version: go1.1.1

I cannot reproduce anymore.

Client version: 0.5.0
Server version: 0.5.0
Git commit: 51f6c4a
Go version: go1.1.1

GG :)

@dscape can you try again with docker 0.5.1?

I keep seeing this issue over and over using docker inside VirtualBox. I usually run docker rm $(docker ps -a |cut -d " " -f 1) to remove all containers but many of them fail with stale NFS file handle.

Just to add, I tried some brutal force removing the directories of such containers. After that, trying to remove them via docker rm still prints the same message.

Managed to remove after restarting docker host.

This seems fixed to me.
Using:

# docker version
Client version: 0.5.3
Server version: 0.5.3
Git commit: 5d25f32
Go version: go1.1.1

Also make sure you have no bash running inside the container path.

Was the asynchronous unmount theory ever proven? I wonder if this is the "deleted a container's image while the container is running" bug:

# Pane 1
$ docker run -i -t foo /bin/bash
root@d6d23b36b613:/#

# Pane 2
$ docker rmi foo
Untagged: 1cfaa4fe8724
Deleted: 1cfaa4fe8724
$

# Pane 1
root@d6d23b36b613:/# exit
$ docker rm `docker ps -l -q`
Error: Error destroying container d6d23b36b613: stat /var/lib/docker/containers/d6d23b36b613337b8e8bbc2ee90af11da3c5fab78a07a01a43ba7262359292ca/rootfs: stale NFS file handle

$

@dsissitka i think that is exactly what it is. happened with me.

 $ docker version
Go version (client): go1.1.1
Go version (server): go1.1.1
Last stable version: 0.6.3

how can the container be removed now?

The original issue is resolved in 0.7 because kill does not do an umount anymore. Containers are unmounted when the daemon is stopped.

In case anyone has a /var/lib/docker/volumes directory full of orphaned volumes, feel free to use the following Python script (make sure to understand what it does before executing it):

#!/usr/bin/python

import json
import os
import shutil
import subprocess
import re

dockerdir = '/var/lib/docker'
volumesdir = os.path.join(dockerdir, 'volumes')

containers = dict((line, 1) for line in subprocess.check_output('docker ps -a -q -notrunc', shell=True).splitlines())

volumes = os.walk(os.path.join(volumesdir, '.')).next()[1]
for volume in volumes:
    if not re.match('[0-9a-f]{64}', volume):
        print volume + ' is not a valid volume identifier, skipping...'
        continue
    volume_metadata = json.load(open(os.path.join(volumesdir, volume, 'json')))
    container_id = volume_metadata['container']
    if container_id in containers:
        print 'Container ' + container_id[:12] + ' does still exist, not clearing up volume ' + volume
        continue
    print 'Deleting volume ' + volume + ' (container: ' + container_id[:12] + ')'
    volumepath = os.path.join(volumesdir, volume)
    print 'Volumepath: ' + volumepath
    shutil.rmtree(volumepath)

thanks for the script! I fixed the indentation and a small bug:

container_id = volume_metadata['id'] # (not container anymore)

https://gist.github.com/mindreframer/7787702

Thanks! No idea why the indentation was messed up in my post, edited + fixed it.

I used volume_metadata['container'] because I was still on 0.6.6 when I wrote the script, but anyone using 0.7.0 (or later) should use your changes.