rfjakob/gocryptfs

Running out of file descriptors when switching from encfs

lechner opened this issue · 18 comments

Hi,

When logging into one home directory I switched from encfs, X runs out of file descriptors. (Terminal logins are okay.) Does gocryptfs consume more file descriptors than encfs?

The encrypted directory is mounted via kerberized NFSv4. Thank you!

Best regards,
Felix

Hmm. How do you mount the directory? Through PAM?

No PAM yet. I log onto a terminal as root and use 'su -p' and kinit to get user privileges on NFSv4. Then I drop back to root and run 'gocryptfs -allow_other'. Finally I switch to the graphical terminal and log in. (On EncFS, I use a similar procedure.)

I just tried to reproduce the issue, but couldn't. The gocryptfs process had about 160 files open after the login completed, which is far from the default ulimit of 1024 (check ulimit -n for what limit you have on your system).

Can you, maybe from another terminal, monitor the number of open files? Works like that:

  1. Find the gocryptfs PID, for example ps auxwww | grep gocryptfs

while true; do
ls /proc/PID/fd | wc -l >> /tmp/files.log
sleep 0.5
done
  1. Login through X
  2. Check /tmp/files.log

Here you go. For comparison I also included the numbers for EncFS.

The problem occurs only on a particular home directory with hundreds of fonts that fontconfig may try to cache. My soft ulimit is 1024. Thank you for investigating!

gocryptfs.open-files.txt
encfs.open-files.txt

Wow, interesting! Looks like EncFS only needs 1/2 of the file descriptors. I'll check what happens here.

The quick workaround is to increase the limit:

ulimit -n 10000

Thank you for your help. Setting ulimit -n 65536 does the trick.

Attached are the new numbers. EncFS may be five times as efficient.

gocryptfs-ulimit.open-files.txt

It's probably this trick in libfuse: https://github.com/libfuse/libfuse/blob/master/example/passthrough_ll.c#L421
Nice idea.

Edit: No it's something else. But there is a mechanism that reuses file descriptors. This means that one file that is openened N times only uses one file descriptor for EncFS, but N file descriptors in gocryptfs.

If I'm reading it well, the reuse of file descriptors is done by this mechanism in EncFS:

When a file is opened, the node representing it is stored in a map indexed by the file path: https://github.com/vgough/encfs/blob/master/encfs/Context.h#L76
This FileNode eventually encapsulates a RawFileIO that does the I/O with the underlying filesystem. If a file descriptor was already opened in a suitable read/write mode, it is recycled between different callers: https://github.com/vgough/encfs/blob/master/encfs/RawFileIO.cpp#L120

What do you think about implementing a similar system in gocryptfs? We could maybe evolve the code in fusefrontend/write_lock.go to keep the file descriptors in the DevInoStruct, that uniquely identifies each file, and recycle them between different consumers of the same file... What is your opinion?

@danim7 Thanks for the EncFS analysis! Yes, I want to implement something similar, and I think the DevIno map you mention is a good place to hook into. As the first step I have renamed the write_lock.go file to open_file_table.go.

As the whole thing scares me a little, I'd like to only do this for read-only file descriptors for now. I ran an lsof on my home dir, and it looks like most of the fds are read-only, so we should still get significant gains.

I'm concerned that adopting this behavior for files on which exclusive locks can be requested would require breaking flock() support. EncFS has already chosen to disable flock() -- see https://bugzilla.redhat.com/show_bug.cgi?id=440483 and the associated launchpad ticket https://bugs.launchpad.net/encfs/+bug/200685.

@charles-dyfis-net Good point, that's gonna be a problem if we want to propagate the locks to the backing files (#39). Currently (in EncFS and gocryptfs) the locks are only stored inside the kernel.

Just brainstorming: Is it possible to recover the PID of the processes calling us? If so, when asked for a lock, we could register the PID in the open_file_table.go among with the shared file descriptor, and only allow that process to work with the file until the lock is released. That way, both evolutions would be compatible...

@danim7, unfortunately, it's not that easy. If a process inherits a FD from its parent, any lock or unlock actions apply to both processes -- and this is very widely used functionality; witness the use of flock(1) when passed a file descriptor number from shell.

Thanks for pointing that out @charles-dyfis-net, I keep brainstorming ;)

Would it be possible something like this? A duplicate-on-lock mechanism, somewhat similar to the copy-on-write technique (https://en.wikipedia.org/wiki/Copy-on-write)

  • By default, we re-use FD for a given device+inode among the file consumers.
  • If a consumer asks for a lock, we create a new FD and propagate the lock to it. The other FD are kept around, but they would be waiting for the FD having the lock.
  • When the lock is released, we may come back to the previous situation of having a single FD for that file, or stay with the duplicated FD for that inode until all FD are closed and we can delete the node from memory.

Yeah, good idea for a "step two" of the implementation. I think for "step one" I would just add a command line option like -propagate_locks that enables lock propagation but disables FD multiplexing.

2d43288 should mitigate this for now.

Actually I think 2d43288 is probably good enough. 4096 would have been high enough for the values @lechner has seen (which is far higher than what I have seen). Not implementing FD multiplexing saves us a lot of headaches.

For the case that somebody still hits the limit I have added explicit logging for too many open files errors in c52e1ab. If somebody hits it again I will reconsider.

Looks great! Debian's hard limit is 65536. You provide fabulous customer service. Thank you!