Getting non-blocking live output for `stdout` and `stderr`
brendan-simon-indt opened this issue · 27 comments
Is there a way to get live output for both stdout
and stderr
at the same time?
If there is a long running process that outputs to stdout
with some errors on stderr
interspersed, how can I read them in real-time (so that I can display both stdout
and stderr
as they happen.
So far, in my tests on windows, I get both stdout and stderr output when using live_output=True
, tested.
Test schema:
Create test.ps1
file containing:
Write-Output "BEGIN"
sleep 1
Write-Output "1SEC"
sleep 1
Write-Error "2SEC ERROR"
sleep 1
Write-Output "3SEC"
sleep 1
Write-Output "END"
Create test.py
file containing
cmd=r"C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe C:\GIT\command_runner\command_runner\test.ps1"
exit_code, output = command_runner(cmd, shell=True, live_output=True)
print("SCRIPT FINISHED. OUTPUT WAS:")
print(output)
Output (where the first lines before SCRIPT FINISHED appear each second):
DEBUT
1SEC
C:\GIT\command_runner\command_runner\test.ps1 : 2SEC ERROR
Au caractère Ligne:1 : 1
+ C:\GIT\command_runner\command_runner\test.ps1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,test.ps1
3SEC
END
OUTPUT
DEBUT
1SEC
C:\GIT\command_runner\command_runner\test.ps1 : 2SEC ERROR
Au caractère Ligne:1 : 1
+ C:\GIT\command_runner\command_runner\test.ps1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,test.ps1
3SEC
END
Could you come up with a test case that produces stdout and some stderr output so I can write some tests please ?
Perhaps your env is Linux, which should work just like Windows. A test script would be useful.
Done the same test under Linux, works too as expected:
File test.sh
containing:
echoerr() { echo "$@" 1>&2; }
echo "BEGIN"
sleep 1
echo "1SEC"
sleep 1
echoerr "2SEC ERR"
sleep 1
echo "3SEC"
sleep 1
echo "END"
File test.py
containing:
from command_runner import command_runner
cmd="/usr/bin/bash test.sh"
exit_code, output = command_runner(cmd, live_output=True)
print("SCRIPT FINISHED, OUTPUT WAS:")
print(output)
Output (where the first lines before SCRIPT FINISHED appear each second):
BEGIN
1SEC
2SEC ERR
3SEC
END
SCRIPT FINISHED, OUTPUT WAS:
BEGIN
1SEC
2SEC ERR
3SEC
END
I need a test case for what you're trying to achieve.
Both my tests where done with v1.3.1.
Hi. Ok I will try out your module and give it a go.
My use case is a Windows app communicating to some remote Linux boxes via ssh (so the windows app will call ssh
or rsync
and report back output).
My plan was to parse the live output so I can update a GUI progress bar, etc, for feedback on the long running command.
Looks like the live_output=True
option just echos output to stdout
and stderr
.
I don't think my python app can capture that for partial processing.
Is there a callback option that can be passed to command_runner
?
That way my app can process partial data in real-time.
That's what live_output=True
is supposed to do, print output while executing.
What you're searching is to get output back to your program before execution ends.
There are multiple ways to achieve this:
- Specify a file for stdout and stderr and read from them
ex: command_runner(mycmd, stdout=r"C:\somepath\stdout.log")
While it's the easiest way, you'll have to deal with reading the file on modifications.
- Use a Queue()
ex:
output_queue = queue.Queue()
command_runner(mycmd, stdout=output_queue, stderr=output_queue)
# Read from queue
while True:
try:
line = output_queue.get(timeout=.1)
except queue.Empty:
pass
else:
if line is None:
break
else:
# your code using "line" variable
- Use a callback function
ex:
def my_func(string):
# your code
command_runner(mycmd, stdout=my_func, stderr=my_func)
Easy way, but you might get garbage (like half of an output line or so) depending on what subprocess returns.
Solutions 1 and 2 require you to thread command_runner, since it would be blocking.
Solution 3 doesn't require threading, but you'll have to handle a buffer to reconstruct partial strings.
Also, solution 3 may inhibit timeout
argument unreliable if your callback function blocks, unless you thread it too.
Anyway, and since you're building a GUI, I guess you already use threads in order to avoid blocking UIs.
All three solutions need to be implemented, and are fairly easy to code, and not even mutual exclusive.
Which one would you like to check out ?
As a sidenote, I've coded a couple of threaded GUIs for Windows. what GUI lib are you using ?
Awesome! Lots of options.
If subprocess.Popen()
is called in text
mode (i.e. text=True
or universal_newlines=True
), would stdout
and stderr
be line buffered?
I'm using wxPython
(latest 4.1.2a snapshot - latest working one) as my GUI library.
I haven't addressed the threading/blocking issue yet. It's a new app and I'm just experiment with the low level technologies for communications using ssh
and rsync
. I think I'm good with those now, but now looking at the best way to run them (e.g. I was using subprocess.run()
directly, then moved to subprocess.Popen()
to get more control over real-time output, and then I found code_runner
)
However, I do need to sort out the blocking issue, as I intend to run multiple commands simultaneously - e.g. rsync
multiple files/directories to multiple Linux boxes. There might be multiple dialogs for output feedback or interaction?
wxPython does have an API (producer and consumer) for long running actions (using threads).
I've also used wxasync (provides asyncio
support for wxPython
). The asyncio
routines still have to play nice and yield though.
Initially I was wondering if just running the commands as part of a new dialog would be enough (dialogs have their own event loop apparently - at least wxPython does). I haven't trialed that yet, but I suspect a blocking dialog might block other GUI event loops.
Otherwise just running the command in a thread (python threading
or wxPython
producer/consumer) might be simpler.
Yes, subprocess.Popen would be line buffered, but you'll have lots of trouble when switching Python versions. Had enough subprocess trouble that pushed me developping an overlay.
I actually developped command_runner
in order to avoid subprocess compatibility issues (some versions miss timeout, others don't decode text properly).
But initially, it was developped since subprocess timeout argument doesn't work when launching windows GUI apps, because windows stream.readline() implementation is blocking.
There are some reasons why you would want to keep control over your threads instead of letting a GUI lib decice, since you definitly want to be able to shutdown rsync properly when someone closes your GUI or your progress bar.
So far, I think the best route you could go would be the Queue route.
I can implement that function fairly quickly in command_runner if you're interested.
To implement command_runner as thread, you could do something like:`
cmd_thread = threading.Thread(
target=command_runner, args=(cmd)
)
cmd_thread.daemon = True # thread dies with the program
cmd.thread.start()
In order to stop command_runner thread properly, we could add an argument that executes a function in order to know if we still need to run. ex:
def my_gui_process_is_running():
return True if whatever_condition_you_are_checking else False
Then run command_runner with argument stop_on=my_gui_process_is_running()
Again, I can implement this fairly quickly if you want and could help you out using it properly.
Btw, having written alot of Windows GUI apps, you should definitly checkout PySimpleGUI.
PySimpleGUI is a framework that can use Tk, QT, or Wx for both Windows and Linux.
I found it to be the easiest way to achieve a full blown Windows GUI, with progress bars, graphs, controls, etc.
Let me know if I can help you out here.
I am definitly interested in improving command_runner
to become a de-facto easy to implement substitute for subprocess.Popen, of which it already accepts all arguments.
Thanks for all that. I'll stick with wxPython for now, and might investigate PSG when I have more time. It looks interesting, but I fear it might not have some of the more advanced widgets that wxPython has (e.g. TreeListCtrl, etc).
I thought that the Queue, File and Callback feature were already implemented. Did I misinterpret your previous response?
I'm happy to try out the queue option but the callback option also appeals to me.
Indeed, you did misinterpret, I said I could easily implement those.
That said, I did so in branch https://github.com/netinvent/command_runner/tree/generic-return-improvments
Haven't documented everything yet nor written all tests, but it should work the way I wrote above, for both callbacks and output queues.
You can specify different callbacks/queues for stdout or stderr, or just specify one type for stdout so stderr gets redirected to stdout.
I also wrote the stop_on part which can be fairly useful for GUI.
Feel free to comment if you have trouble using it.
I'm having a look at it now.
One not regard the changes in README.md
Example: `command_runner(cmd, min_resolution=0.2)`
I don't like the min_resolution
name. I think interval
is better (or maybe check_interval
if you want to be more verbose/explicit).
It would also be cool if there were options to code_runner
to create and start a thread for the call (e.g. using threading
or multiprocess
or even ascyncio
- but maybe that introduces too many depenedencies?)
I don't like the
min_resolution
name. I thinkinterval
is better (or maybecheck_interval
if you want to be more verbose/explicit).
Indeed, min_resolution was the internal name before becoming an argument. check_interval
sounds pretty good to me. [EDIT] Changed in branch generic-return-improvements[/EDIT]
It would also be cool if there were options to code_runner to create and start a thread for the call (e.g. using threading or multiprocess or even ascyncio - but maybe that introduces too many depenedencies?)
I don't really see what functionnality you are seeking here.
Internally, there are already threads to handle live output and timeouts.
Using asyncio
won't work because subprocess isn't written that way, and IMO it won't.
Using multiprocessing
isn't an option because the whole script needs to be written in order to allow multiprocess execution. Hence I leave that up to the command_runner user.
Using threading
will let you keep control over the program, but it won't allow multiple CPU core usage.
Basically, when I want to thread command_runner
, I use a function decorator that threads execution in order to keep control in my program.
My threading decorator:
from threading import Thread
from concurrent.futures import Future
from functools import wraps
def call_with_future(fn, future, args, kwargs):
"""
Threading a function with return info using Future
from https://stackoverflow.com/a/19846691/2635443
"""
try:
result = fn(*args, **kwargs)
future.set_result(result)
except Exception as exc:
future.set_exception(exc)
def threaded(fn):
"""
@threaded wrapper in order to thread any function
@wraps decorator sole purpose is for function.__name__ to be the real function
instead of 'wrapper'
"""
@wraps(fn)
def wrapper(*args, **kwargs):
future = Future()
Thread(target=call_with_future, args=(fn, future, args, kwargs)).start()
return future
return wrapper
Then I can launch command_runner threaded like:
@threaded
thread_result= command_runner(cmd)
# MY CODE HERE CONTINUES SINCE command_runner function is threaded
while not thread_result.done():
sleep(1)
# EXPLOIT RESULT SINCE IT'S DONE
exit_code, output = thread_result.result()
I think I now got what you're looking for.
I ended up adding my threading code to command_runner (baked it directly in so I don't get more dependencies, original lib I use: https://github.com/netinvent/ofunctions/blob/master/ofunctions/threading/__init__.py)
Updated README.md, hopefully readable.
Added unit tests for callback and queue readings.
Drank coffee.
I think what you're searching for would look like the code below:
import queue
from time import sleep
from command_runner import command_runner_threaded
output_queue = queue.Queue()
# Launch command_runner as thread that will return a concurrent.future result after execution
thread_result = command_runner_threaded('ping 127.0.0.1', shell=True, method='poller', stdout=output_queue)
# Now read the queue given to stdout until execution ends
read_queue = True
while read_queue:
if thread_result.done():
read_queue = False
try:
line = output_queue.get(timeout=0.1)
except queue.Empty:
pass
else:
if line is None:
break
else:
# ADD YOUR LIVE CODE HERE TO DEAL WITH RSYNC OUTPUT
# basic rsync regex example
# try:
# result = re.search(r"(.*)xfer(.*)", line):
# print(result.group(1), result.group(2))
# except AttributeError:
# pass
# Now we may get exit_code and full output since result has become available at this point
exit_code, output = thread_result.result()
Does this fit your needs?
Of course if you want to read stdout and stderr separately, you'll have to specify another queue for stderr and read that one too.
That seems to be working and I am using separate queues for stdout
and stderr
.
I don't seem to get a None
object when reading stderr
though, as is the case with stdout
I am detecting the end of the read process by checking when both queues return None
.
Got your case replicated. This is what happens when you code at 1am...
I've fixed the part where read loop wasn't waiting for stderr queue to end creating a race condition.
Also added tests so this case is covered now.
Have a look at tests/test_command_runner.py
function test_double_queue_threaded_stop()
to see what queue read implementation I use for both stdout and stderr.
I've merged the branch with the modifications above today into master.
Got a lot of trouble getting python 2.7 and pypy compatibility with the modifications I made, but everything worked out well.
Did you succeed using command_runner for your project ? is it working for you ?
PS: updated the examples with easier code
Hi Deejan,
I had to park it for a little bit. I hope to get back on to it today or over the weekend. I am definitely keen to use it and think it will do the job nicely :)
I will let you know if I hit any roadblocks or have any questions or suggestions.
Hi Deejan.
I did some experiments with command_runner
and command_runner_threaded
, and here are my observations.
I have a read_task
, which I run in a thread. I then use command_runner()
(within a manually created thread) to run a command that outputs to stdout
and stderr
. The read_task
now completes when both stdout_queue
and stderr_queue
return None
:)
I repeat the above test using command_runner_threaded()
(instead of manually created thread) and the read_task
does not complete. The stdout_queue
returns None
, but the stderr_queue
does not.
Hope that makes sense.
I just redid all my tests, and noticed that there was a typo in the README.md example I made.
Fixed in 2081b1f
Also, I just released v1.4.0, with alot more improvements and tests.
Please update your code to current release and, if you copied the code from README.md, please fix the typo ;)
I just made the test again with command_runner_threaded. It works well for me, stderr_queue returns None.
Retested with v1.4.0
and same result.
The problem is that my flow is slightly different to your example.
- start read thread
- call
command_runner_threaded()
with my command - call
exit_code, output = thread_result.result()
The thread_result.result()
call blocks and the read thread doesn't execute all (not until the command terminates).
I tried putting a read_thread.join()
before thread_result.result()
but a similar thing happens.
It seems the read thread will not run. I think the issue is that this happening in a GUI button click event handler and the thread only kicks off when the handler completes (so no looping within the button event handler).
I think you misunderstood the threading function.
Calling command_runner_threaded
will give you back a result which can't be used until the thread has finished. In the meantime, you get live output via stdout/stderr queues.
The call to thread_result.result()
should only be done once the read queue has ended, or else it will block until command runner thread is finished, since it cannot compute exit_code
before.
Actually, you should not have a read thread at all, but a read queue loop, and call thread_result.result()
after read queue is finished. Since your read queue gets stdout and stderr stream live, calling thread_result.result()
only adds the exit code.
If you really need a separate read thread, you should call thread_result.result()
only once read_thread.is_alive() is False
so your program will not block.
If you have a git repo, I'll happily have a look into your code.
Yes, understood. I do need a thread because it will be long running task (to upload files to a remote box and then execute the file to perform some desired functionality). The idea that either a dialog box will appear to provide feedback (or a dashboard updated) but the main GUI still needs to be responsive (e.g. to initiate more transactions with various other boxes).
One similar type of example would be Windows explorer doing a large file copy to a remote server. A dialog appears for transfer feedback, but Windows Explorer is still active and other file transfers can be initiated, etc.
No repo. I still need to update the GUI in the main GUI thread, so I'm thinking of using the GUI idle handler or timer event handler. Another option would be to use wxasync
and use asyncio
versions of queue
.
Does command_runner
support asyncio.Queue
?
Actually I don't think asyncio
would work in my case, as I have to run some windows binaries (e.g. rsync
) and they wont play nice with asyncio
(i.e. they will block), so threads it is.
command_runner
would not support asyncio
because underlying subprocess
doesn't.
If you want your GUI to stay responsive, I'd use your read thread to update the GUI with stdout / stderr output that it receives from the queue given to command_runner_threaded
.
Once the read thread is done, use thread_result.result()
to get exit_code and full output for logs / success / error messages.
I have done all the test cases on linux and windows, from python 2.7 to python 3.10 and pypy, with success, so I decided to release the version including the improvements you asked.
Feel free to ask other improvements, but I do think that the current version will handle your scenario quite well.
Yes, I think command_runner
will work well for me and will be using it. I'll let you know how it goes.
I can't directly update the GUI from the read thread, because the GUI will crash (eventually). GUI must be updated from the GUI thread so either need another thread communications mechanism (e.g. another queue) or some other way to place data into the GUI event loop (which all seems like double handling). At the moment I am planning to to put the read queue(s) code into the GUI Idle or Timer event handlers.
Closing this issue since the enhancement is done. Feel free to reopen issue if needed.