async_process_results: $3 contains both stdout and stderr output

Question

async_process_results: $3 contains both stdout and stderr output

zmwangx opened this issue 9 years ago · 30 comments

The current README claims that the third argument to the callback function is

$3: resulting (stdout) output from job execution

However, inspecting the source code of _async_job shows that the output mixes both stdout and stderr:

local out
out=$(eval "$@" 2>&1)
local ret=$?

A simple experiment also confirms the impression:

async_init
async_start_worker foo -n
function echo_result { echo $3; }
async_register_callback foo echo_result
async_job foo mv 2>/dev/null

The output is

mv: missing file operand
Try 'mv --help' for more information.

on stdout, which shouldn't happen if $3 is only collecting stdout.

So it seems to me that at the very least the documentation is somewhat misleading.

On a more fundamental level, though, I wonder if mixing stdout and stderr is the right thing to do. Maybe it's better to separate the two streams? Often times, littering stdout with stderr can totally trash the output.

Answer 1 · 2015-07-14T07:33:04.000Z

Thanks for spotting the error in the documentation!

The behavior is as intended though. But if you have any ideas on how to separate the stdout and stderr without using temporary files I'm open to suggestions. At this moment the error code should be checked for knowing if the output can be trusted or not.

Answer 2 · 2015-07-14T14:46:49.000Z

if you have any ideas on how to separate the stdout and stderr without using temporary files

Unfortunately not 😞

Answer 3 · 2015-07-14T15:16:19.000Z

Actually there's a way after all.

unset stdout stderr ret
eval "$( { eval "$@"; } \
             >  >(stdout=$(cat); typeset -p stdout) \
             2> >(stderr=$(cat); typeset -p stderr)
         ret=$?; typeset -p ret )"

Courtesy http://stackoverflow.com/a/18086548/1944784.

I'm not entirely convinced that the output of typeset -p stdout and typeset -p stderr won't be mixed, though.

Answer 4 · 2015-07-14T17:13:14.000Z

Cool, your suggestion seems to work just fine (doesn't mix output). It does seem a bit hackish though and I'm not sure I fully understand what's going on, but it got me thinking.

typeset stdout stderr ret
stdout=$(eval "$@") ret=$? 2>&1 | read stderr

This seems to do the trick as well and easier to comprehend for my puny brain :D.

Answer 5 · 2015-07-14T17:43:31.000Z

Committed a change that should resolve this (updated docs as well), thanks a bunch for your help. And if you find anything lacking still in the solution, please tell me!

Answer 6 · 2015-07-14T18:17:33.000Z

stdout=$(eval "$@") ret=$? 2>&1 | read stderr

Wow that's cool! Though, would you please explain what's going on here? I've seen assignments used that way before, but I never understood why the pipe can be used like that. This is not even a pipeline, since $pipestatus only gets a single element.

By the way, I think read stderr is a bit sloppy, and should be replaced by read -r -d '' stderr. Test cases:

> stdout=$(print 'hello\nworld' >&2 ) ret=$? 2>&1 | read stderr; echo $stderr
hello
> stdout=$(print '\\' >&2 ) ret=$? 2>&1 | read stderr; echo $stderr

> stdout=$(print 'hello\nworld' >&2 ) ret=$? 2>&1 | read -r -d '' stderr; echo $stderr
hello
world
> stdout=$(print '\\' >&2 ) ret=$? 2>&1 | read -r -d '' stderr; echo $stderr
\

Answer 7 · 2015-07-14T18:36:35.000Z

In fact, even read -r -d '' won't help when you have nuls in the output.

> stdout=$(find /usr -maxdepth 1 -print0 >&2) ret=$? 2>&1 | read -r -d '' stderr
> echo -nE $stderr | xargs -0 ls -d
/usr
> unset stdout stderr ret
> eval "$( { find /usr -maxdepth 1 -print0 >&2; } > >(stdout=$(cat); typeset -p stdout) 2> >(stderr=$(cat); typeset -p stderr); ret=$?; typeset -p ret )"
> echo -nE $stderr | xargs -0 ls -d
/usr      /usr/X11R6  /usr/include  /usr/libexec  /usr/sbin   /usr/standalone
/usr/X11  /usr/bin    /usr/lib      /usr/local    /usr/share  /usr/texbin

Answer 8 · 2015-07-14T18:45:21.000Z

It does seem a bit hackish though and I'm not sure I fully understand what's going on.

Actually it's quite simple. typeset -p prints a parameter and its value as a typeset and assignment which can be evaluated; so although we can't assign to parameters from a subshell and affect the parent shell, we can certainly typeset -p in the subshells and evaluate back in the parent shell with eval.

Answer 9 · 2015-07-14T19:02:44.000Z

Regarding the read, another thing I can think of is to read line by line using a known separator (e.g., $'\n'), and assemble them up manually.

stderr=; stdout=$(eval "$@") ret=$? 2>&1 | while read -r -d $'\n' line; do stderr+=$line$'\n'; done; stderr+=$line

Slightly better coding style:

local sep=$'\n'
stderr=; stdout=$(eval "$@") ret=$? 2>&1 | while read -r -d $sep line; do stderr+=$line$sep; done; stderr+=$line

Test:

> stderr=; stdout=$(find /usr -maxdepth 1 -print0 >&2) ret=$? 2>&1 | while read -r -d $'\n' line; do stderr+=$line$'\n'; done; stderr+=$line
> echo -nE $stderr | xargs -0 ls -d
/usr      /usr/X11R6  /usr/include  /usr/libexec  /usr/sbin   /usr/standalone
/usr/X11  /usr/bin    /usr/lib      /usr/local    /usr/share  /usr/texbin

Answer 10 · 2015-07-14T19:07:36.000Z

Wow that's cool! Though, would you please explain what's going on here? I've seen assignments used that way before, but I never understood why the pipe can be used like that. This is not even a pipeline, since $pipestatus only gets a single element.

I guess the reasoning is something along the lines of: The subshell is only capturing the stdout and the stderr is leaking out, which means we are piping stdout=$(eval "$@") instead of whatever is run in the subshell. But as you already notices, read is a bit limited here.

In fact, even read -r -d '' won't help when you have nuls in the output.

Yeah, unfortunately, and read is also prone to leave the command hanging, if it get's no input. Also, nulls are very problematic since I rely wholly on null-characters to separate the parameters from the asynchronous jobs.

Actually it's quite simple.

You're right, I was unfamiliar with what typeset -p does, now it makes a lot more sense :)

Answer 11 · 2015-07-14T19:09:17.000Z

Regarding the read, another thing I can think of is to read line by line using a known separator (e.g., $'\n'), and assemble them up manually.

Something like that might work, yeah. But I ended up fixing it according to your first suggestion, seems more robust, thanks!

Answer 12 · 2015-07-14T19:18:00.000Z

To further mitigate the potential possibility of mixed output, this patch could be applied

diff --git a/async.zsh b/async.zsh
index 0090f65..857e206 100644
--- a/async.zsh
+++ b/async.zsh
@@ -26,8 +26,8 @@ _async_job() {
        unset stdout stderr ret
        eval "$(
                ( eval "$@" ) \
-                       > >(stdout=$(cat); typeset -p stdout) \
-                       2> >(stderr=$(cat); typeset -p stderr)
+                       > >(stdout=$(cat); read -ep >/dev/null; typeset -p stdout; print -p "t") \
+                       2> >(stderr=$(cat); read -ep >/dev/null; typeset -p stderr; print -p "t")
                ret=$?; typeset -p ret
        )"

But not sure it's necessary.

Answer 13 · 2015-07-14T19:31:28.000Z

For reference, here's and interesting solution using coprocs. ~~It does mask the return code for some reason though when trying to pick it out :P~~ or not, just me being silly and using too large error codes!

diff --git a/async.zsh b/async.zsh
index 0090f65..0ec3453 100644
--- a/async.zsh
+++ b/async.zsh
@@ -14,22 +14,14 @@ _async_job() {
        local duration=$EPOCHREALTIME

        # Run the command
-       #
-       # What is happening here is that we are assigning stdout, stderr and ret to
-       # variables, and then we are printing out the variable assignment through
-       # typeset -p. This way when we run eval we get something along the lines of:
-       #       eval "
-       #               typeset stdout=' M async.test.sh\n M async.zsh'
-       #               typeset stderr=''
-       #               typeset ret=0
-       #       "
-       unset stdout stderr ret
-       eval "$(
-               ( eval "$@" ) \
-                       > >(stdout=$(cat); typeset -p stdout) \
-                       2> >(stderr=$(cat); typeset -p stderr)
-               ret=$?; typeset -p ret
-       )"
+       local stdout stderr ret
+       coproc cat
+       exec {p}<&p                                     # Copy the <&p descriptor
+       stdout=$( (eval "$@") 2>&p)
+       ret=$?
+       coproc exit                                     # Close both <&p and >&p descriptors
+       stderr="$(cat <&$p)"            # Read from copy descriptor
+       exec {p}<&-                                     # Close descriptor copy

        # Calculate duration
        duration=$(( $EPOCHREALTIME - $duration ))

EDIT: Confirmed that this breaks the coproc I use for a "mutex" to prevent jobs from printing at the same time, so that's not an option.

Answer 14 · 2015-07-14T20:45:41.000Z

With the latest on master, I'm now intermittently getting "typeset" to Stdout when calling git_status. I was able to replicate it using the async.test.sh script included in the Repo.

git:
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

Compelted job: 'git_status'
Return code: 0
Duration: 0.010901212692260742 seconds
Stdout: 'typeset'
Stderr: ''

Compelted job: 'git_status'
Return code: 0
Duration: 0.010807037353515625 seconds
Stdout: ''
Stderr: ''

Like I said before, this doesn't happen every time, and does appear to be a symptom of the changes made in the last couple commits. But when using this with Pure prompt, the git status shows as "typeset" intermittently.

EDIT: Here are some screenshots illustrating what it looks like in Pure.
Previous behavior:

New behavior:

Answer 15 · 2015-07-14T21:09:24.000Z

@mafredri

To further mitigate the potential possibility of mixed output, this patch could be applied

I'm not sure about the patch. Can you read and write to the coproc from a subshell? I suppose you would get errors like read: -p: no coprocess and print: -p: no coprocess? But then if you start a fresh coproc within the subshell, you lose the locking ability, right?

@brandon099 Sorry for the breakage. Unfortunately I can't reproduce though. Is your test run from inside a git repo or not? If it is, is the repo dirty or not?

Also, would you please post the output of

(git_status) > >(stdout=$(cat); typeset -p stdout) 2> >(stderr=$(cat); typeset -p stderr); ret=$?; typeset -p ret

probably run it a few times? Thanks.

P.S. I seriously hope this isn't the first sighting of mixed typeset -p stdout and typeset -p stderr in the wild. Fingers crossed.

Answer 16 · 2015-07-14T21:20:57.000Z

Ah forgot to say:

Also, nulls are very problematic since I rely wholly on null-characters to separate the parameters from the asynchronous jobs.

Yeah, now that I have read async_process_results, it seems to me that this is the only reasonable thing to do, which would of course break down when job output contains nulls.

Since this seems unavoidable, maybe it's better to note that in the documentation?

Edit: Apparently Limitations is a good place for this.

Answer 17 · 2015-07-14T21:22:41.000Z

I'm not sure about the patch. Can you read and write to the coproc from a subshell? I suppose you would get errors like read: -p: no coprocess and print: -p: no coprocess? But then if you start a fresh coproc within the subshell, you lose the locking ability, right?

Here's a gist with a sample patch and output that shows the process is working (since the token is changing) https://gist.github.com/mafredri/0979074a45478121a25b.

Answer 18 · 2015-07-14T21:24:13.000Z

@brandon099: So sorry about the breakage. What ZSH version are you running? Also, could you try out the patch in the gist I posted above?

Answer 19 · 2015-07-14T21:27:05.000Z

Edit: Apparently Limitations is a good place for this.

Fair enough! For robustness I could strip all null characters from command output to avoid any potential problems when they eventually do appear. Maybe replace with newlines?

Answer 20 · 2015-07-14T21:38:05.000Z

Maybe replace with newlines?

I would rather just strip them out and replace with nothing. Coming from a Bash background, I would note that Bash strings can't even hold nulls:

> bash -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d'
ls: cannot access /usr/usr/bin/usr/include/usr/lib/usr/libexec/usr/local/usr/sbin/usr/share/usr/standalone/usr/texbin/usr/X11/usr/X11R6: No such file or directory

So nulls stripped when assigned to a string is what I intuitively expect...

Here's a gist with a sample patch and output that shows the process is working.

Thanks, I'll take a look later.

P.S. Anecdotal fact: other shells are equally bad at handling nulls:

dash on Ubuntu 14.04 LTS:

> sh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d'
ls: cannot access /usr/usr/include/usr/x86_64-linux-gnu/usr/share/usr/src/usr/bin/usr/lib32/usr/class/usr/libx32/usr/lib/usr/local/usr/games/usr/sbin: No such file or directory

ksh on Ubuntu 14.04 LTS (@(#)MIRBSD KSH R46 2013/05/02):

> ksh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d'
ls: cannot access /usr/usr/include/usr/x86_64-linux-gnu/usr/share/usr/src/usr/bin/usr/lib32/usr/class/usr/libx32/usr/lib/usr/local/usr/games/usr/sbin: No such file or directory

ksh on OS X 10.10.4 (sh (AT&T Research) 93u+ 2012-08-01):

> ksh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d'
/usr

Answer 21 · 2015-07-14T22:39:21.000Z

All your nulls are belong to nobody 8b43533!

Answer 22 · 2015-07-15T02:58:00.000Z

I still haven't got a chance to play with the locking patch, but I think I just came up with a (hopefully) completely race-condition-free solution that does the same thing, just sequentially:

diff --git a/async.zsh b/async.zsh
old mode 100644
new mode 100755
index 28ddf73..3195974
--- a/async.zsh
+++ b/async.zsh
@@ -25,11 +25,12 @@ _async_job() {
    #   "
    unset stdout stderr ret
    eval "$(
-       ( eval "$@" ) \
-           > >(stdout=$(cat); typeset -p stdout) \
-           2> >(stderr=$(cat); typeset -p stderr)
-       ret=$?; typeset -p ret
-   )"
+            (
+              stdout=$(eval "$@")
+              ret=$?
+              typeset -p stdout ret
+            ) 2> >(stderr=$(cat); typeset -p stderr)
+          )"

    # Calculate duration
    duration=$(( EPOCHREALTIME - duration ))

I haven't updated the comments, but hopefully it is easy enough to understand. The idea is that stderr=$(cat) only finishes after the subshell containing typeset -p stdout ret returns and closes the pipe, so typeset -p stderr must happen strictly after typeset -p stdout ret. The side benefit of this patch is that we actually saved a few subshells, hence a few forks ;)

Answer 23 · 2015-07-15T07:32:35.000Z

@zmwangx I think that just might work! And I think we can decrease the subshells even further.

diff --git a/async.zsh b/async.zsh
index 28ddf73..fe36038 100644
--- a/async.zsh
+++ b/async.zsh
@@ -25,10 +25,9 @@ _async_job() {
        #       "
        unset stdout stderr ret
        eval "$(
-               ( eval "$@" ) \
-                       > >(stdout=$(cat); typeset -p stdout) \
-                       2> >(stderr=$(cat); typeset -p stderr)
-               ret=$?; typeset -p ret
+               stdout=$(eval "$@") 2> >(stderr=$(cat); typeset -p stderr)
+               ret=$?
+               typeset -p stdout ret
        )"

        # Calculate duration

Answer 24 · 2015-07-15T09:18:55.000Z

That was my initial idea actually, but it doesn't have the nice property of being sequential. Just try

stdout=$(echo stdout; echo stderr >&2) 2> >(stderr=$(cat); sleep 5; typeset -p stderr); ret=$?; typeset -p stdout ret

Answer 25 · 2015-07-15T12:47:23.000Z

Yes, I see what you're getting at. Generally I don't think it's a problem if it's not sequential, but just to be safe and avoid any potential "printing on top of each other", let's go with your version.

PS. Do you think replacing the surrounding () with {} will have any positive effect, are there any performance differences between code blocks and subshells?

Answer 26 · 2015-07-15T13:03:35.000Z

Generally I don't think it's a problem if it's not sequential,

Me neither. But just to be safe...

Do you think replacing the surrounding () with {} will have any positive effect, are there any performance differences between code blocks and subshells?

Good call, I think we should go with

eval "$( { stdout=$(eval "$@"); ret=$?; typeset -p stdout ret; } 2> >(stderr=$(cat); typeset -p stderr) )"

instead. There's no functional difference, and we eliminate one subshell at least theoretically (and practically, assuming Zsh doesn't have something like tail call optimization). I doubt there's much real world gain (Guido taught me to not worry about performance too much 😉), but why not? Theoretical gains are also nice.

Answer 27 · 2015-07-15T13:07:16.000Z

Good call, I think we should go with instead.

Let's!

There's no functional difference, and we eliminate one subshell at least theoretically (and practically, assuming Zsh doesn't have something like tail call optimization). I doubt there's much real world gain (Guido taught me to not worry about performance too much 😉), but why not? Theoretical gains are also nice.

Hehe, true that! The whole point of this library is to not care about performance ^^.

Answer 28 · 2015-07-15T14:12:12.000Z

@brandon099 would you mind testing the latest master and see if you can still reproduce your bug? I'm hoping it's fixed now :)!

Answer 29 · 2015-07-15T14:24:40.000Z

Yep, I haven't been able to replicate the issue with the latest master (ee4eea1) -- great work! 😄

Answer 30 · 2015-07-15T16:09:48.000Z

mafredri commented 9 years ago