std::process::Command no way to handle command-line length limits

Question

std::process::Command no way to handle command-line length limits

Opened this issue 7 years ago · 9 comments

The arg method doesn't track the total resulting command-line length and has no way of indicating to clients that the resulting command-line length would exceed the OS's underlying maximum length. This is fine for launching subprocesses with dozens of arguments, but renders it impossible to implement xargs or similar functionality.

Can I suggest a new method

fn try_add_arg<S: AsRef<OsStr>>(&mut self, arg: S) -> Option<S>

with documentation saying that it's only preferable to arg when you're wanting multi-kilobyte command-lines?

try_add_arg would keep track of the number of args already added, and if the number of args or the resulting length would exceed the OS limits, then it ignores the argument and returns it back to the client. Otherwise it acts like the normal arg method and returns None.

Answer 1 · 2017-03-09T13:32:40.000Z

Easer to ask for forgiveness instead of permission.

See this. While I link the python documentation here, the idea is just as applicable to Rust. In this case it is more sensible to check for argument length by inspecting the error side of the Result.

Answer 2 · 2017-03-09T13:55:55.000Z

@nagisa I'm aware of that idiom and of the "Time of check to time of use" errors that it avoids, but it doesn't really apply in this case:

as far as I can tell, there's no OS-independent way of determining that the reason my spawn call failed was because the command-line length was too long.
I'm performing an xargs like operation where I get a stream of strings and want to batch them up into suprocess calls (e.g. "rm -rf string1 string2 string3 string4 sting5 ... string256", then "rm -rf string257 string258 string259... string640) with each batch being as large as possible.

The only way I can think of to achieve this is

something like the above method
some horrible "increase args exponentially and then back off after it fails" kludge - and even that requires a way to tell that my spawn failed due to overly large command-line (and a guarantee that the OS will fail with an error rather than just silently truncating for instance).
putting in some hard-coded limit and some generic "calculate the probable command-line length" code in my program and hoping that OS-specific things like quoting don't make it hopelessly inaccurate...

Answer 3 · 2017-03-09T14:19:22.000Z

The command line for Windows is flattened into one big UTF-16 string, the length of which, including the null terminator, is limited to 32,768 UTF-16 codeunits.

Answer 4 · 2017-08-11T03:13:35.000Z

Erm, would it be safe to behave like xargs -n 100 <...>, that is, defaulting to a batch size of 100 or so arguments, which most OS's can reasonably be expected to handle?

Answer 5 · 2017-08-11T10:35:14.000Z

It wouldn't be 100% safe: I could always come up with a unlikely scenario where any number of arguments would result in a too-long command line. More importantly , the lower the "-n" value becomes, the more safe it becomes, but at the cost of lower performance. I wouldn't want to guess what the best compromise between performance and safety is: adding a "try_add_argument" type method means that I could easily be both safe and maximally performant. Mark

…

On Fri, Aug 11, 2017 at 4:14 AM, Andrew Pennebaker ***@***.*** > wrote: Erm, would it be safe to behave like xargs -n 100 <...>, that is, defaulting to a batch size of 100 or so arguments, which most OS's can reasonably be expected to handle? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#40384 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALbIF9lgMGTH9nXxFacQaS4G48C8Tuejks5sW8cAgaJpZM4MX33u> .

Answer 6 · 2017-11-19T00:37:48.000Z

I agree that xargs-style batching needs to be supported somehow. I would be open to considering a well-tested cross-platform implementation of this in a PR.

Answer 7 · 2020-05-03T20:44:42.000Z

We can first start with calculating the size of argv and envp in Unix, and the size of the cmdline in Windows. Then we can have methods that make use of this is-it-full information.

For Unix (sys/unix/process/process_common) this should be a sum of argv and envp sizes. The former size is easy to calculate as the sum of (length of each C string + 1). The latter is unspecified, but common sense dictates something similar (that's what GNU xargs does anyways.) With an extra headroom of 2048 bytes we should be fine. Another issue is the dynamic size of ARG_MAX: I suggest having a cached value for each instance of Command that can be explicitly reset.
- VxWorks seems to have sysconf and ARG_MAX. Try using it in that process_common too?
For Windows we need to ask make_command_line about it. The obvious way is to get the.len() of the Vec it spits out, but we also need a way to build the vec in smaller pieces so we don't do the copying over and over. (Accidentally quadratic otherwise.) The allowed size is fixed at 32768 and env does not matter in the unicode API we use.

Answer 8 · 2020-05-03T22:16:46.000Z

On LInux at least, you'll need to calculate the length of each string (including NUL byte) plus the size of the actual arrays (e.g. (argc + 1) * sizeof(char *)). And the allocation is done in pages, so round up to a multiple of the page size.

Answer 9 · 2022-05-25T17:57:12.000Z

The latest release of the argmax crate is complete enough to implement length-limited command lines in fd. I'd encourage anyone who's encountering this issue to give it a try and report any issues/missing features.

If people like the API I can try to make an implementation for the standard library.