async_process_results: $3 contains both stdout and stderr output
zmwangx opened this issue ยท 30 comments
The current README claims that the third argument to the callback function is
$3
: resulting (stdout) output from job execution
However, inspecting the source code of _async_job
shows that the output mixes both stdout and stderr:
local out
out=$(eval "$@" 2>&1)
local ret=$?
A simple experiment also confirms the impression:
async_init
async_start_worker foo -n
function echo_result { echo $3; }
async_register_callback foo echo_result
async_job foo mv 2>/dev/null
The output is
mv: missing file operand
Try 'mv --help' for more information.
on stdout, which shouldn't happen if $3
is only collecting stdout.
So it seems to me that at the very least the documentation is somewhat misleading.
On a more fundamental level, though, I wonder if mixing stdout and stderr is the right thing to do. Maybe it's better to separate the two streams? Often times, littering stdout with stderr can totally trash the output.
Thanks for spotting the error in the documentation!
The behavior is as intended though. But if you have any ideas on how to separate the stdout and stderr without using temporary files I'm open to suggestions. At this moment the error code should be checked for knowing if the output can be trusted or not.
if you have any ideas on how to separate the stdout and stderr without using temporary files
Unfortunately not ๐
Actually there's a way after all.
unset stdout stderr ret
eval "$( { eval "$@"; } \
> >(stdout=$(cat); typeset -p stdout) \
2> >(stderr=$(cat); typeset -p stderr)
ret=$?; typeset -p ret )"
Courtesy http://stackoverflow.com/a/18086548/1944784.
I'm not entirely convinced that the output of typeset -p stdout
and typeset -p stderr
won't be mixed, though.
Cool, your suggestion seems to work just fine (doesn't mix output). It does seem a bit hackish though and I'm not sure I fully understand what's going on, but it got me thinking.
typeset stdout stderr ret
stdout=$(eval "$@") ret=$? 2>&1 | read stderr
This seems to do the trick as well and easier to comprehend for my puny brain :D.
Committed a change that should resolve this (updated docs as well), thanks a bunch for your help. And if you find anything lacking still in the solution, please tell me!
stdout=$(eval "$@") ret=$? 2>&1 | read stderr
Wow that's cool! Though, would you please explain what's going on here? I've seen assignments used that way before, but I never understood why the pipe can be used like that. This is not even a pipeline, since $pipestatus
only gets a single element.
By the way, I think read stderr
is a bit sloppy, and should be replaced by read -r -d '' stderr
. Test cases:
> stdout=$(print 'hello\nworld' >&2 ) ret=$? 2>&1 | read stderr; echo $stderr
hello
> stdout=$(print '\\' >&2 ) ret=$? 2>&1 | read stderr; echo $stderr
> stdout=$(print 'hello\nworld' >&2 ) ret=$? 2>&1 | read -r -d '' stderr; echo $stderr
hello
world
> stdout=$(print '\\' >&2 ) ret=$? 2>&1 | read -r -d '' stderr; echo $stderr
\
In fact, even read -r -d ''
won't help when you have nuls in the output.
> stdout=$(find /usr -maxdepth 1 -print0 >&2) ret=$? 2>&1 | read -r -d '' stderr
> echo -nE $stderr | xargs -0 ls -d
/usr
> unset stdout stderr ret
> eval "$( { find /usr -maxdepth 1 -print0 >&2; } > >(stdout=$(cat); typeset -p stdout) 2> >(stderr=$(cat); typeset -p stderr); ret=$?; typeset -p ret )"
> echo -nE $stderr | xargs -0 ls -d
/usr /usr/X11R6 /usr/include /usr/libexec /usr/sbin /usr/standalone
/usr/X11 /usr/bin /usr/lib /usr/local /usr/share /usr/texbin
It does seem a bit hackish though and I'm not sure I fully understand what's going on.
Actually it's quite simple. typeset -p
prints a parameter and its value as a typeset and assignment which can be evaluated; so although we can't assign to parameters from a subshell and affect the parent shell, we can certainly typeset -p
in the subshells and evaluate back in the parent shell with eval
.
Regarding the read
, another thing I can think of is to read line by line using a known separator (e.g., $'\n'
), and assemble them up manually.
stderr=; stdout=$(eval "$@") ret=$? 2>&1 | while read -r -d $'\n' line; do stderr+=$line$'\n'; done; stderr+=$line
Slightly better coding style:
local sep=$'\n'
stderr=; stdout=$(eval "$@") ret=$? 2>&1 | while read -r -d $sep line; do stderr+=$line$sep; done; stderr+=$line
Test:
> stderr=; stdout=$(find /usr -maxdepth 1 -print0 >&2) ret=$? 2>&1 | while read -r -d $'\n' line; do stderr+=$line$'\n'; done; stderr+=$line
> echo -nE $stderr | xargs -0 ls -d
/usr /usr/X11R6 /usr/include /usr/libexec /usr/sbin /usr/standalone
/usr/X11 /usr/bin /usr/lib /usr/local /usr/share /usr/texbin
Wow that's cool! Though, would you please explain what's going on here? I've seen assignments used that way before, but I never understood why the pipe can be used like that. This is not even a pipeline, since $pipestatus only gets a single element.
I guess the reasoning is something along the lines of: The subshell is only capturing the stdout and the stderr is leaking out, which means we are piping stdout=$(eval "$@")
instead of whatever is run in the subshell. But as you already notices, read
is a bit limited here.
In fact, even
read -r -d ''
won't help when you have nuls in the output.
Yeah, unfortunately, and read is also prone to leave the command hanging, if it get's no input. Also, nulls are very problematic since I rely wholly on null-characters to separate the parameters from the asynchronous jobs.
Actually it's quite simple.
You're right, I was unfamiliar with what typeset -p
does, now it makes a lot more sense :)
Regarding the read, another thing I can think of is to read line by line using a known separator (e.g., $'\n'), and assemble them up manually.
Something like that might work, yeah. But I ended up fixing it according to your first suggestion, seems more robust, thanks!
To further mitigate the potential possibility of mixed output, this patch could be applied
diff --git a/async.zsh b/async.zsh
index 0090f65..857e206 100644
--- a/async.zsh
+++ b/async.zsh
@@ -26,8 +26,8 @@ _async_job() {
unset stdout stderr ret
eval "$(
( eval "$@" ) \
- > >(stdout=$(cat); typeset -p stdout) \
- 2> >(stderr=$(cat); typeset -p stderr)
+ > >(stdout=$(cat); read -ep >/dev/null; typeset -p stdout; print -p "t") \
+ 2> >(stderr=$(cat); read -ep >/dev/null; typeset -p stderr; print -p "t")
ret=$?; typeset -p ret
)"
But not sure it's necessary.
For reference, here's and interesting solution using coprocs. It does mask the return code for some reason though when trying to pick it out :P or not, just me being silly and using too large error codes!
diff --git a/async.zsh b/async.zsh
index 0090f65..0ec3453 100644
--- a/async.zsh
+++ b/async.zsh
@@ -14,22 +14,14 @@ _async_job() {
local duration=$EPOCHREALTIME
# Run the command
- #
- # What is happening here is that we are assigning stdout, stderr and ret to
- # variables, and then we are printing out the variable assignment through
- # typeset -p. This way when we run eval we get something along the lines of:
- # eval "
- # typeset stdout=' M async.test.sh\n M async.zsh'
- # typeset stderr=''
- # typeset ret=0
- # "
- unset stdout stderr ret
- eval "$(
- ( eval "$@" ) \
- > >(stdout=$(cat); typeset -p stdout) \
- 2> >(stderr=$(cat); typeset -p stderr)
- ret=$?; typeset -p ret
- )"
+ local stdout stderr ret
+ coproc cat
+ exec {p}<&p # Copy the <&p descriptor
+ stdout=$( (eval "$@") 2>&p)
+ ret=$?
+ coproc exit # Close both <&p and >&p descriptors
+ stderr="$(cat <&$p)" # Read from copy descriptor
+ exec {p}<&- # Close descriptor copy
# Calculate duration
duration=$(( $EPOCHREALTIME - $duration ))
EDIT: Confirmed that this breaks the coproc I use for a "mutex" to prevent jobs from printing at the same time, so that's not an option.
With the latest on master, I'm now intermittently getting "typeset" to Stdout when calling git_status. I was able to replicate it using the async.test.sh script included in the Repo.
git:
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
Compelted job: 'git_status'
Return code: 0
Duration: 0.010901212692260742 seconds
Stdout: 'typeset'
Stderr: ''
Compelted job: 'git_status'
Return code: 0
Duration: 0.010807037353515625 seconds
Stdout: ''
Stderr: ''
Like I said before, this doesn't happen every time, and does appear to be a symptom of the changes made in the last couple commits. But when using this with Pure prompt, the git status shows as "typeset" intermittently.
EDIT: Here are some screenshots illustrating what it looks like in Pure.
Previous behavior:
New behavior:
To further mitigate the potential possibility of mixed output, this patch could be applied
I'm not sure about the patch. Can you read and write to the coproc from a subshell? I suppose you would get errors like read: -p: no coprocess
and print: -p: no coprocess
? But then if you start a fresh coproc within the subshell, you lose the locking ability, right?
@brandon099 Sorry for the breakage. Unfortunately I can't reproduce though. Is your test run from inside a git repo or not? If it is, is the repo dirty or not?
Also, would you please post the output of
(git_status) > >(stdout=$(cat); typeset -p stdout) 2> >(stderr=$(cat); typeset -p stderr); ret=$?; typeset -p ret
probably run it a few times? Thanks.
P.S. I seriously hope this isn't the first sighting of mixed typeset -p stdout
and typeset -p stderr
in the wild. Fingers crossed.
Ah forgot to say:
Also, nulls are very problematic since I rely wholly on null-characters to separate the parameters from the asynchronous jobs.
Yeah, now that I have read async_process_results
, it seems to me that this is the only reasonable thing to do, which would of course break down when job output contains nulls.
Since this seems unavoidable, maybe it's better to note that in the documentation?
Edit: Apparently Limitations is a good place for this.
I'm not sure about the patch. Can you read and write to the coproc from a subshell? I suppose you would get errors like read: -p: no coprocess and print: -p: no coprocess? But then if you start a fresh coproc within the subshell, you lose the locking ability, right?
Here's a gist with a sample patch and output that shows the process is working (since the token is changing) https://gist.github.com/mafredri/0979074a45478121a25b.
@brandon099: So sorry about the breakage. What ZSH version are you running? Also, could you try out the patch in the gist I posted above?
Edit: Apparently Limitations is a good place for this.
Fair enough! For robustness I could strip all null characters from command output to avoid any potential problems when they eventually do appear. Maybe replace with newlines?
Maybe replace with newlines?
I would rather just strip them out and replace with nothing. Coming from a Bash background, I would note that Bash strings can't even hold nulls:
> bash -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d'
ls: cannot access /usr/usr/bin/usr/include/usr/lib/usr/libexec/usr/local/usr/sbin/usr/share/usr/standalone/usr/texbin/usr/X11/usr/X11R6: No such file or directory
So nulls stripped when assigned to a string is what I intuitively expect...
Here's a gist with a sample patch and output that shows the process is working.
Thanks, I'll take a look later.
P.S. Anecdotal fact: other shells are equally bad at handling nulls:
-
dash
on Ubuntu 14.04 LTS:> sh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d' ls: cannot access /usr/usr/include/usr/x86_64-linux-gnu/usr/share/usr/src/usr/bin/usr/lib32/usr/class/usr/libx32/usr/lib/usr/local/usr/games/usr/sbin: No such file or directory
-
ksh
on Ubuntu 14.04 LTS (@(#)MIRBSD KSH R46 2013/05/02
):> ksh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d' ls: cannot access /usr/usr/include/usr/x86_64-linux-gnu/usr/share/usr/src/usr/bin/usr/lib32/usr/class/usr/libx32/usr/lib/usr/local/usr/games/usr/sbin: No such file or directory
-
ksh
on OS X 10.10.4 (sh (AT&T Research) 93u+ 2012-08-01
):> ksh -c 'output=$(find /usr -maxdepth 1 -print0) && echo -n "$output" | xargs -0 ls -d' /usr
I still haven't got a chance to play with the locking patch, but I think I just came up with a (hopefully) completely race-condition-free solution that does the same thing, just sequentially:
diff --git a/async.zsh b/async.zsh
old mode 100644
new mode 100755
index 28ddf73..3195974
--- a/async.zsh
+++ b/async.zsh
@@ -25,11 +25,12 @@ _async_job() {
# "
unset stdout stderr ret
eval "$(
- ( eval "$@" ) \
- > >(stdout=$(cat); typeset -p stdout) \
- 2> >(stderr=$(cat); typeset -p stderr)
- ret=$?; typeset -p ret
- )"
+ (
+ stdout=$(eval "$@")
+ ret=$?
+ typeset -p stdout ret
+ ) 2> >(stderr=$(cat); typeset -p stderr)
+ )"
# Calculate duration
duration=$(( EPOCHREALTIME - duration ))
I haven't updated the comments, but hopefully it is easy enough to understand. The idea is that stderr=$(cat)
only finishes after the subshell containing typeset -p stdout ret
returns and closes the pipe, so typeset -p stderr
must happen strictly after typeset -p stdout ret
. The side benefit of this patch is that we actually saved a few subshells, hence a few forks ;)
@zmwangx I think that just might work! And I think we can decrease the subshells even further.
diff --git a/async.zsh b/async.zsh
index 28ddf73..fe36038 100644
--- a/async.zsh
+++ b/async.zsh
@@ -25,10 +25,9 @@ _async_job() {
# "
unset stdout stderr ret
eval "$(
- ( eval "$@" ) \
- > >(stdout=$(cat); typeset -p stdout) \
- 2> >(stderr=$(cat); typeset -p stderr)
- ret=$?; typeset -p ret
+ stdout=$(eval "$@") 2> >(stderr=$(cat); typeset -p stderr)
+ ret=$?
+ typeset -p stdout ret
)"
# Calculate duration
That was my initial idea actually, but it doesn't have the nice property of being sequential. Just try
stdout=$(echo stdout; echo stderr >&2) 2> >(stderr=$(cat); sleep 5; typeset -p stderr); ret=$?; typeset -p stdout ret
Yes, I see what you're getting at. Generally I don't think it's a problem if it's not sequential, but just to be safe and avoid any potential "printing on top of each other", let's go with your version.
PS. Do you think replacing the surrounding ()
with {}
will have any positive effect, are there any performance differences between code blocks and subshells?
Generally I don't think it's a problem if it's not sequential,
Me neither. But just to be safe...
Do you think replacing the surrounding () with {} will have any positive effect, are there any performance differences between code blocks and subshells?
Good call, I think we should go with
eval "$( { stdout=$(eval "$@"); ret=$?; typeset -p stdout ret; } 2> >(stderr=$(cat); typeset -p stderr) )"
instead. There's no functional difference, and we eliminate one subshell at least theoretically (and practically, assuming Zsh doesn't have something like tail call optimization). I doubt there's much real world gain (Guido taught me to not worry about performance too much ๐), but why not? Theoretical gains are also nice.
Good call, I think we should go with instead.
Let's!
There's no functional difference, and we eliminate one subshell at least theoretically (and practically, assuming Zsh doesn't have something like tail call optimization). I doubt there's much real world gain (Guido taught me to not worry about performance too much ๐), but why not? Theoretical gains are also nice.
Hehe, true that! The whole point of this library is to not care about performance ^^.
@brandon099 would you mind testing the latest master and see if you can still reproduce your bug? I'm hoping it's fixed now :)!
Yep, I haven't been able to replicate the issue with the latest master (ee4eea1) -- great work! ๐