rust-lang/rust

std::process::Command no way to handle command-line length limits

Opened this issue ยท 9 comments

The arg method doesn't track the total resulting command-line length and has no way of indicating to clients that the resulting command-line length would exceed the OS's underlying maximum length. This is fine for launching subprocesses with dozens of arguments, but renders it impossible to implement xargs or similar functionality.

Can I suggest a new method

fn try_add_arg<S: AsRef<OsStr>>(&mut self, arg: S) -> Option<S>

with documentation saying that it's only preferable to arg when you're wanting multi-kilobyte command-lines?

try_add_arg would keep track of the number of args already added, and if the number of args or the resulting length would exceed the OS limits, then it ignores the argument and returns it back to the client. Otherwise it acts like the normal arg method and returns None.

Easer to ask for forgiveness instead of permission.

See this. While I link the python documentation here, the idea is just as applicable to Rust. In this case it is more sensible to check for argument length by inspecting the error side of the Result.

@nagisa I'm aware of that idiom and of the "Time of check to time of use" errors that it avoids, but it doesn't really apply in this case:

  1. as far as I can tell, there's no OS-independent way of determining that the reason my spawn call failed was because the command-line length was too long.
  2. I'm performing an xargs like operation where I get a stream of strings and want to batch them up into suprocess calls (e.g. "rm -rf string1 string2 string3 string4 sting5 ... string256", then "rm -rf string257 string258 string259... string640) with each batch being as large as possible.

The only way I can think of to achieve this is

  • something like the above method
  • some horrible "increase args exponentially and then back off after it fails" kludge - and even that requires a way to tell that my spawn failed due to overly large command-line (and a guarantee that the OS will fail with an error rather than just silently truncating for instance).
  • putting in some hard-coded limit and some generic "calculate the probable command-line length" code in my program and hoping that OS-specific things like quoting don't make it hopelessly inaccurate...

The command line for Windows is flattened into one big UTF-16 string, the length of which, including the null terminator, is limited to 32,768 UTF-16 codeunits.

Erm, would it be safe to behave like xargs -n 100 <...>, that is, defaulting to a batch size of 100 or so arguments, which most OS's can reasonably be expected to handle?

I agree that xargs-style batching needs to be supported somehow. I would be open to considering a well-tested cross-platform implementation of this in a PR.

We can first start with calculating the size of argv and envp in Unix, and the size of the cmdline in Windows. Then we can have methods that make use of this is-it-full information.

  • For Unix (sys/unix/process/process_common) this should be a sum of argv and envp sizes. The former size is easy to calculate as the sum of (length of each C string + 1). The latter is unspecified, but common sense dictates something similar (that's what GNU xargs does anyways.) With an extra headroom of 2048 bytes we should be fine. Another issue is the dynamic size of ARG_MAX: I suggest having a cached value for each instance of Command that can be explicitly reset.
    • VxWorks seems to have sysconf and ARG_MAX. Try using it in that process_common too?
  • For Windows we need to ask make_command_line about it. The obvious way is to get the.len() of the Vec it spits out, but we also need a way to build the vec in smaller pieces so we don't do the copying over and over. (Accidentally quadratic otherwise.) The allowed size is fixed at 32768 and env does not matter in the unicode API we use.

On LInux at least, you'll need to calculate the length of each string (including NUL byte) plus the size of the actual arrays (e.g. (argc + 1) * sizeof(char *)). And the allocation is done in pages, so round up to a multiple of the page size.

The latest release of the argmax crate is complete enough to implement length-limited command lines in fd. I'd encourage anyone who's encountering this issue to give it a try and report any issues/missing features.

If people like the API I can try to make an implementation for the standard library.