Running out of file descriptors when switching from encfs

Question

Running out of file descriptors when switching from encfs

lechner opened this issue 8 years ago · 18 comments

Hi,

When logging into one home directory I switched from encfs, X runs out of file descriptors. (Terminal logins are okay.) Does gocryptfs consume more file descriptors than encfs?

The encrypted directory is mounted via kerberized NFSv4. Thank you!

Best regards,
Felix

Answer 1 · 2017-02-24T23:12:07.000Z

Hmm. How do you mount the directory? Through PAM?

Answer 2 · 2017-02-24T23:50:54.000Z

No PAM yet. I log onto a terminal as root and use 'su -p' and kinit to get user privileges on NFSv4. Then I drop back to root and run 'gocryptfs -allow_other'. Finally I switch to the graphical terminal and log in. (On EncFS, I use a similar procedure.)

Answer 3 · 2017-02-25T22:06:00.000Z

I just tried to reproduce the issue, but couldn't. The gocryptfs process had about 160 files open after the login completed, which is far from the default ulimit of 1024 (check ulimit -n for what limit you have on your system).

Can you, maybe from another terminal, monitor the number of open files? Works like that:

Find the gocryptfs PID, for example ps auxwww | grep gocryptfs

while true; do
ls /proc/PID/fd | wc -l >> /tmp/files.log
sleep 0.5
done

Login through X
Check /tmp/files.log

Answer 4 · 2017-02-26T00:03:19.000Z

Here you go. For comparison I also included the numbers for EncFS.

The problem occurs only on a particular home directory with hundreds of fonts that fontconfig may try to cache. My soft ulimit is 1024. Thank you for investigating!

gocryptfs.open-files.txt
encfs.open-files.txt

Answer 5 · 2017-02-26T09:55:23.000Z

Wow, interesting! Looks like EncFS only needs 1/2 of the file descriptors. I'll check what happens here.

The quick workaround is to increase the limit:

ulimit -n 10000

Answer 6 · 2017-02-26T23:28:01.000Z

Thank you for your help. Setting ulimit -n 65536 does the trick.

Attached are the new numbers. EncFS may be five times as efficient.

gocryptfs-ulimit.open-files.txt

Answer 7 · 2017-03-28T20:08:11.000Z

It's probably this trick in libfuse: https://github.com/libfuse/libfuse/blob/master/example/passthrough_ll.c#L421
Nice idea.

Edit: No it's something else. But there is a mechanism that reuses file descriptors. This means that one file that is openened N times only uses one file descriptor for EncFS, but N file descriptors in gocryptfs.

Answer 8 · 2017-04-28T17:59:58.000Z

If I'm reading it well, the reuse of file descriptors is done by this mechanism in EncFS:

When a file is opened, the node representing it is stored in a map indexed by the file path: https://github.com/vgough/encfs/blob/master/encfs/Context.h#L76
This FileNode eventually encapsulates a RawFileIO that does the I/O with the underlying filesystem. If a file descriptor was already opened in a suitable read/write mode, it is recycled between different callers: https://github.com/vgough/encfs/blob/master/encfs/RawFileIO.cpp#L120

What do you think about implementing a similar system in gocryptfs? We could maybe evolve the code in fusefrontend/write_lock.go to keep the file descriptors in the DevInoStruct, that uniquely identifies each file, and recycle them between different consumers of the same file... What is your opinion?

Answer 9 · 2017-04-29T20:35:30.000Z

@danim7 Thanks for the EncFS analysis! Yes, I want to implement something similar, and I think the DevIno map you mention is a good place to hook into. As the first step I have renamed the write_lock.go file to open_file_table.go.

As the whole thing scares me a little, I'd like to only do this for read-only file descriptors for now. I ran an lsof on my home dir, and it looks like most of the fds are read-only, so we should still get significant gains.

Answer 10 · 2017-04-29T20:42:09.000Z

I'm concerned that adopting this behavior for files on which exclusive locks can be requested would require breaking flock() support. EncFS has already chosen to disable flock() -- see https://bugzilla.redhat.com/show_bug.cgi?id=440483 and the associated launchpad ticket https://bugs.launchpad.net/encfs/+bug/200685.

Answer 11 · 2017-04-29T23:07:09.000Z

@charles-dyfis-net Good point, that's gonna be a problem if we want to propagate the locks to the backing files (#39). Currently (in EncFS and gocryptfs) the locks are only stored inside the kernel.

Answer 12 · 2017-04-30T02:02:04.000Z

Just brainstorming: Is it possible to recover the PID of the processes calling us? If so, when asked for a lock, we could register the PID in the open_file_table.go among with the shared file descriptor, and only allow that process to work with the file until the lock is released. That way, both evolutions would be compatible...

Answer 13 · 2017-04-30T02:40:34.000Z

@danim7, unfortunately, it's not that easy. If a process inherits a FD from its parent, any lock or unlock actions apply to both processes -- and this is very widely used functionality; witness the use of flock(1) when passed a file descriptor number from shell.

Answer 14 · 2017-04-30T12:08:10.000Z

Thanks for pointing that out @charles-dyfis-net, I keep brainstorming ;)

Would it be possible something like this? A duplicate-on-lock mechanism, somewhat similar to the copy-on-write technique (https://en.wikipedia.org/wiki/Copy-on-write)

By default, we re-use FD for a given device+inode among the file consumers.
If a consumer asks for a lock, we create a new FD and propagate the lock to it. The other FD are kept around, but they would be waiting for the FD having the lock.
When the lock is released, we may come back to the previous situation of having a single FD for that file, or stay with the duplicated FD for that inode until all FD are closed and we can delete the node from memory.

Answer 15 · 2017-04-30T15:41:20.000Z

Yeah, good idea for a "step two" of the implementation. I think for "step one" I would just add a command line option like -propagate_locks that enables lock propagation but disables FD multiplexing.

Answer 16 · 2017-05-02T21:18:51.000Z

2d43288 should mitigate this for now.

Answer 17 · 2017-05-03T21:58:25.000Z

Actually I think 2d43288 is probably good enough. 4096 would have been high enough for the values @lechner has seen (which is far higher than what I have seen). Not implementing FD multiplexing saves us a lot of headaches.

For the case that somebody still hits the limit I have added explicit logging for too many open files errors in c52e1ab. If somebody hits it again I will reconsider.

Answer 18 · 2017-05-03T22:18:52.000Z

Looks great! Debian's hard limit is 65536. You provide fabulous customer service. Thank you!