cms-dev/cms

Sandbox does not set file size limit

veluca93 opened this issue · 18 comments

The sandbox does not set a limit on the size of the files created by the solution. This can lead to various problems.

  • Causing an out-of-memory situation for the worker, if the sandboxes are kept in a big enough tmpfs (which happens in a lot of distributions that mount a tmpfs in /tmp).
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>

#define BS 8192

int main() {
    int fd = open("output.txt", O_CREAT | O_WRONLY | O_TRUNC,
                                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    char buf[BS] = {};
    while(1)
        write(fd, buf, BS);
}
  • Allowing the user to somewhat bypass memory limits by creating a big file, then mmapping parts of it and using it as extra memory.
  • Leaving a very big output.txt file, causing the worker to fail if there is no space left on the device, or to eat up a lot of memory (~4 times the size of the file). Using fallocate allows a program to allocate a lot of space in a very short time. For example, the following program
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>

int main() {
    int fd = open("output.txt", O_CREAT | O_WRONLY | O_TRUNC,
                                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    fallocate(fd, 0, 0, 10*1024*1024*1024l);
}

takes only two seconds to complete on my computer and creates a 10GB file.

It is a very bad idea to publish details on security issues when there are no ready solutions. You should have communicated them privately. At the moment, I don't have time to check how difficult it is to implement file system quota in isolate.

As pointed out by Stefano, there is a --quota option in isolate, which should fix the problem. I don't have the time to check it out now; if you're running a contest with this version of CMS, please consider the possibility to modify Sandbox.build_box_options() in file cms/grading/Sandbox.py, forcibly adding a --quota option with a reasonable value. Please note that this solution has not been tested so far.

I haven't been able to make it work at the moment, but I'm confident it
should be our solution. I can't work on it anymore now, so I'm giving some
pointers:

  • --quota must be set at isolate --init, not at isolate --run (this is
    mildly inconvenient for us as our option parsing works only for --run, so
    we need change it a little.
  • when using --quota, I get a ESRCH errno (which is confusely rendered as
    "No such process"), which by quotactl man page means "No disk quota is
    found for the indicated user. Quotas have not been turned on for this
    filesystem.";
  • you need to have CONFIG_QUOTA=y in your kernel;
  • not sure of the correct mount options for the filesystem holding /tmp,
    maybe (as suggested by Gio) quota,usrquota,grpquota? [mount -o
    remount,quota,usrquota,grpquota /device/with/tmp]
  • I think that even after enabling these options you need to create some
    file by hand, but I stopped here.

On Wed Apr 30 2014 at 8:47:50 AM, Giovanni Mascellani <
notifications@github.com> wrote:

As pointed out by Stefano, there is a --quota option in isolate, which
should fix the problem. I don't have the time to check it out now; if
you're running a contest with this version of CMS, please consider the
possibility to modify Sandbox.build_box_options() in file
cms/grading/Sandbox.py, forcibly adding a --quota option with a
reasonable value. Please note that this solution has not been tested so far.


Reply to this email directly or view it on GitHubhttps://github.com//issues/309#issuecomment-41769262
.

Another solution could be to set RLIMIT_NOFILE to an appropriate value (5, letting contestants open two files beside standard input/output/error) and RLIMIT_FSIZE to the same value as RLIMIT_AS.

Here http://pastebin.com/7kTW9bs4 is a patch that implements the fix suggested in the last comment.

lw commented

I don't think your proposed fix is advisable, as some languages (probably Java, Python, etc.) need the interpreter to access files on disk to load parts of the standard library. Therefore a global constant limit on the number of open file descriptors isn't feasible.

True, that would be a problem...

I made some experiments with the quota option, and the result is that it seems to work - you have to enable quota in the kernel, enable options usrquota,grpquota for the filesystem holding /tmp, and then run quotacheck -vguma as root to create the quota files. The problem is that tmpfs does not support quotas, so that will break running cms on a system that mounts /tmp on a tmpfs.

Yet another method to do the check could be enabling RLIMIT_FSIZE and then providing isolate with a whitelist of paths that may be opened for writing: using inotify, we find out if a file different from the ones in the whitelist is opened for writing and in that case we kill the user process.

I've tried these steps with Ubuntu 13.10 Server. Ubuntu doesn't mount tmpfs
to /tmp, so I've remounted the root with usrquota,grpquota. And simple
remounting didn't help - I had to reboot to make isolate working.

Also I'm not sure whether it is expected or not, but seems that quota is
counted for the whole /tmp, not just single submission. So if, for example,
I set keep_sandbox=true, after some solutions evaluated, I get quota limit
exceeded even during compilation. Apparently, we do not use keep_sandbox
for real contest, but I don't know the internals good enough to be sure
that this doesn't affect normal solutions too.

On Wed, Apr 30, 2014 at 5:27 PM, veluca93 notifications@github.com wrote:

I made some experiments with the quota option, and the result is that it
seems to work - you have to enable quota in the kernel, enable options
usrquota,grpquota for the filesystem holding /tmp, and then run quotacheck
-vguma as root to create the quota files. The problem is that tmpfs does
not support quotas, so that will break running cms on a system that
mounts /tmp on a tmpfs.

Yet another method to do the check could be enabling RLIMIT_FSIZE and then
providing isolate with a whitelist of paths that may be opened for writing:
using inotify, we find out if a file different from the ones in the
whitelist is opened for writing and in that case we kill the user process.


Reply to this email directly or view it on GitHubhttps://github.com//issues/309#issuecomment-41785977
.

Artem Iglikov

Well, that's a problem as far as I know: if for whatever reason a sandbox does not get deleted (for example, the worker crashes/raises an exception) it could fill up the user quota.

The simplest solution so far would be to deny write permissions on a sandbox folder, then create empty files with write permissions inside it (eg. output.txt)

http://pastebin.com/jui88R3r
This patch "upgrades" the previous fix, setting RLIMIT_FSIZE to the memory limit and allowing TaskTypes to specify a whitelist of files that are allowed to be written to.

lw commented

Copied to cms-dev/isolate#6 (but not closing, since it also involves how CMS uses isolate).

Whitelist of files is implemented and submitted (at least during evaluation).

One nice solution would be to mount a dedicated tmpfs filesystem for each Worker job, setting at mount time its maximum size. This way different job do not interfere and there is no need to set up quota (and no problems if it is not supported).

The problem is that this could be too memory expensive in some cases: if there are big input and output files, at some point they have to be in RAM twice; one copy in the process running the solution and one copy in the tmpfs filesystem. Memory consumption and resource availability vary a lot between contests and tasks, but I think in some cases that would be too much.

Another solution would be to create, format and loop-mount a dedicated image for each job. This would have the same advantages of the tmpfs solution, it would not depend on large RAM, but it would introduce some overhead (probably not too much) and it would be a bit complicated to manage (in particular, it would definitely require root capabilities).

All in all, it seems to be a reasonable alternative for me and I could try to investigate. Let me know what do you think.

@veluca93 Why in your patch you swap the calls to setup_credentials() and setup_fds()?

I have no idea - possibly I had just swapped around some lines around there and in the end i left those in a different order...

Restricting file size does not really help. You can create as many files as you wish. Even if you try to use RLIMIT_NOFILE to cap the number of available file descriptors, you can still close one file and open another whenever you wish.

The only correct solution I see is to use proper filesystem quotas instead, which (as already mentioned) is supported by isolate.

A tmpfs on /tmp/ should not be an issue any longer since recent version of isolate do not use /tmp/ anyway for security reasons.

I see no way how to evade the memory limit using mmap. Without cgroups, isolate limit the total size of virtual memory available to the process, which includes mmap'ed files. In cgroup mode, it limits the total memory available to the cgroup, which also includes cached pages of files (yes, you can mmap a file larger than the available memory, but then the contents of the file will not be present in the memory at once, so it will be terribly slow).

So the only remaining task is to teach CMS how to ask isolate for a quota.

Perfect, seems like a good solution to me.