OCR-D/core

ocrd process should not suppress and drop stdout/stderr from tasks

Closed this issue · 4 comments

In the current implementation of ocrd.task_sequence.run_tasks, all standard output and error of the tasks is captured into variables. Only stderr is shown, and only in case of an exception.

IMHO this defies the idea of loggers and loglevels as far as the (default) stream handlers are concerned. The only logging we now get is from ocrd.cli.process and ocrd.task_sequence itself. I believe core should at least show stdout and stderr – ideally without changing its order (out vs err intermixed) and without buffering. If the user does not wish to see any other loggers, (s)he can easily configure the logging system to write everything but ocrd.cli.process and ocrd.task_sequence into a text file.

I don't see a need to filter out the last stderr message before an exception selectively, so run_cli could just return the out/err pipes (without decode()) and run_tasks could connect them to sys.stdout/stderr, passing on all messages immediately.

Seeing people use ocrd process a lot (due to the website's workflow recommendations), and not getting any logging output for run control or ex-post diagnosis, I think this is kind of urgent. Plus the fix should be simple – do you need help @kba?

kba commented

Seeing people use ocrd process a lot (due to the website's workflow recommendations), and not getting any logging output for run control or ex-post diagnosis, I think this is kind of urgent. Plus the fix should be simple – do you need help @kba?

Currently something else is really urgent, but I'll try to fix this later today.

kba commented

ideally without changing its order (out vs err intermixed) and without buffering

The complicated thing is both to capture and output STDERR/STDOUT at the same time. That would require fiddling with file descriptors and/or asyncio. But if we don't - and there is really no reason to AFAICS - we can just not capture STDERR/STDOUT and not return filehandles at all.

The complicated thing is both to capture and output STDERR/STDOUT at the same time. That would require fiddling with file descriptors and/or asyncio. But if we don't - and there is really no reason to AFAICS - we can just not capture STDERR/STDOUT and not return filehandles at all.

Exactly my point (see last paragraph). Let std exception handlers do the rest.