abg/dbsake

dbsake decompression functions broken on platforms where sys.stdin.flush() is invalid

Closed this issue · 0 comments

abg commented

On OS X (and probably other BSD), fflush() seems to only be valid on writable streams. For readable streams, it seems that fpurge() should be used, which doesn't seem to be available in python2, at least.

dbsake currently does some seeking around on stdin to detect the input stream type (reading the first few bytes to detect the file magic - api used by the sieve command, at least). It then needs to reset the file position back to the beginning before delegating to the actual compression command (/usr/bin/gzip, /usr/bin/bzip2, etc.) This is current done via fileobj.seek(0) + fileobj.flush(). The latter aborts with an EBADF IOError:

$ ./dbsake  sieve -O < mysql.sql.gz
Uncaught exception! (╯°□°)╯ ︵ ┻━┻
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/andrew.garner/stage/__main__.py", line 21, in <module>
    sys.exit(main())
  File "/Users/andrew.garner/stage/__main__.py", line 18, in main
    sys.exit(dbsake.cli.main())
  File "./dbsake/cli/__init__.py", line 123, in main
    dbsake(args=argv, auto_envvar_prefix='DBSAKE', obj={})
  File "./click/core.py", line 488, in __call__
    return self.main(*args, **kwargs)
  File "./click/core.py", line 474, in main
    self.invoke(ctx)
  File "./click/core.py", line 758, in invoke
    return self.invoke_subcommand(ctx, cmd, cmd_name, ctx.args[1:])
  File "./click/core.py", line 767, in invoke_subcommand
    return cmd.invoke(cmd_ctx)
  File "./click/core.py", line 659, in invoke
    ctx.invoke(self.callback, **ctx.params)
  File "./click/core.py", line 325, in invoke
    return callback(*args, **kwargs)
  File "/Users/andrew.garner/stage/dbsake/cli/cmd/sieve.py", line 139, in sieve_cli
    stats = sieve.sieve(options)
  File "./dbsake/core/mysql/sieve/__init__.py", line 70, in sieve
    with open_stream(options) as input_stream:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "./dbsake/util/compression.py", line 80, in decompressed_fileobj
    fileobj.flush()
IOError: [Errno 9] Bad file descriptor

I imagine this affects other BSD platforms other than OS X, but I have only tested on OS X.

A quick workaround seems to be to fallback to the underlying file descriptor and manually seek to 0 with os.lseek(). python 3.4.1 does not seem to require this and works with the current implementation. A quick demonstration:

$ python2.7 -c 'import sys; print(sys.version); sys.stdin.flush()'
2.7.1 (r271:86832, Aug  5 2011, 03:30:24)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 9] Bad file descriptor

Vs. python3

$ python3 -c 'import sys; print(sys.version); sys.stdin.flush()'
3.4.1 (default, May 19 2014, 13:09:54)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
$ echo $?
0