Quintus/ruby-xz

Stream (`IO`-like) mode would be great

dadooda opened this issue · 15 comments

Hi!

String-to-string decompression is great, but sometimes stream mode is very handy. For example, to support .tar.xz decompression on the fly. Here comes Gzip/plain file example:

stream = fn.match /\.gz\z/
  Zlib::GzipReader.open(fn)
elsif fn.match /\.tar\z/
  File.open(fn)
else
  raise "Error: Don't know how to handle '#{fn}', aborting"
end

untar = Gem::Package::TarReader.new(stream)
untar.each do |entry|
  ...
end

Do you think liblzma would allow to implement IO-like streaming in ruby-xz, too?

Hi,

but sometimes stream mode is very handy. For example, to support .tar.xz decompression on the fly.

Actually liblzma only allows stream (de)compression, it doesn’t know how to handle strings or file paths. If you look into ruby-xz’s RDoc documentation, you’ll find the XZ.compress_stream and XZ.decompress_stream methods that are the basis of the ruby-xz library. Every other compression/decompression method is implemented on top of them. Is this what you’re looking for?

... ah sorry, now I understand. You want an IO-like XZ object, right? Hm, I didn’t consider this, but it should be possible to do this on the basis of XZ.(de)compress_stream. I’ll look into that and if I can get it done, I’m going to release a new version of the library as there already have been contributions from others that I don’t want to prevent from being published.

Vale,
Quintus

Marvin

You want an IO-like XZ object, right?

Exactly. When we have one, we could transparently feed decompressed
LZMA data to any IO-compatible data reader. In the example it's Tar
reader.

Alex

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Wed, 14 Dec 2011 12:19:57 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

... ah sorry, now I understand. You want an IO-like XZ object, right? Hm, I didn’t consider this, but it should be possible to do this on the basis of XZ.(de)compress_stream. I’ll look into that and if I can get it done, I’m going to release a new version of the library as there already have been contributions from others.

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

I’ve implemented two classes XZ::StreamReader and XZ::StreamWriter in the current devel branch. Could you please check whether this is what you expect from those two classes? If so, I’m going to close this issue and preparing a release sometime in the next weeks.

Vale,
Quintus

Marvin

Thanks for the notice.

Although block form of XZ::StremReader::open might work, I'd expect the non-block one to work, too.

stream = if fn.match /\.xz\z/
  XZ::StreamReader.open(fn)
elsif fn.match /\.gz\z/
  Zlib::GzipReader.open(fn)
elsif fn.match /\.tar\z/
  File.open(fn)
else
  raise "Error: Don't know how to handle '#{fn}', aborting"
end

From the above example you see that I want to fetch an abstract instance of stream which behaves like IO and then read data from there regardless of the type.

Attempt to do it with XZ::StreamReader results in LocalJumpError: no block given (yield) since there's an unconditional yield in the method body at the moment.

I'd suggest we have XZ::StreamReader completely like File and other IOs do -- the caller might use both the block form or fetch a regular object.

I see. You could do it with XZ::StreamReader.new and XZ::StreamWriter.new as those doesn’t require a block, but I’ll forward non-block calls from ::open to the ::new method.

Vale,
Quintus

I’ve changed ::new and ::close to work exactly the same (in fact they’re now aliases) and even accept a filename if you don’t use the block form. Your example should now work, but I’d like to suggest using a case statement instead, like this:

stream = case fn
when /\.xz\z/
  XZ::StreamReader.open(fn)
when /\.gz\z/
  Zlib::GzipReader.open(fn)
when /\.tar\z/
  File.open(fn)
else
  raise "Error: Don't know how to handle '#{fn}', aborting"
end

Vale,
Quintus

Marvin

I'll check with TarReader a bit later.

Regarding case you are right, looks more elegant.

Alex

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Thu, 2 Feb 2012 06:10:42 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

I’ve changed ::new and ::close to work exactly the same (in fact they’re now aliases) and even accept a filename if you don’t use the block form. Your example should now work, but I’d like to suggest using a case statement instead, like this:

    stream = case fn
    when /\.xz\z/
      XZ::StreamReader.open(fn)
    when /\.gz\z/
      Zlib::GzipReader.open(fn)
    when /\.tar\z/
      File.open(fn)
    else
      raise "Error: Don't know how to handle '#{fn}', aborting"
    end

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

Marvin

TarReader still can't read data from tar.xz via stream API:

  • 1.tar.xz:

    Errno::ESPIPE: Illegal seek
    from /home/alexrb/.rvm/gems/ruby-1.9.2-p180@ruby_xz_contrib/gems/io-like-0.3.0/lib/io/like.rb:1294:in __io_like__buffered_seek' from /home/alexrb/.rvm/gems/ruby-1.9.2-p180@ruby_xz_contrib/gems/io-like-0.3.0/lib/io/like.rb:971:inseek'
    from /home/alexrb/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1/rubygems/package/tar_reader.rb:71:in block in each' from /home/alexrb/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1/rubygems/package/tar_reader.rb:55:inloop'
    from /home/alexrb/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/site_ruby/1.9.1/rubygems/package/tar_reader.rb:55:in each' from a.rb:60:ineach_with_name'
    from a.rb:36:in `test_tar_reader'

  • 1.tar.gz (works):

    Found '1/VERSION'
    Found '1/test/test_io_like_specs.rb'
    Found '1/test/test-data/lorem_ipsum.txt'
    Found '1/test/test-data/lorem_ipsum.txt.xz'
    Found '1/test/test_stream_reader.rb'
    Found '1/test/test_xz.rb'
    Found '1/test/test_stream_writer.rb'
    Found '1/ruby-xz.gemspec'
    Found '1/README.rdoc'
    Found '1/Rakefile.rb'
    Found '1/lib/xz.rb'
    Found '1/lib/xz/stream.rb'
    Found '1/lib/xz/stream_reader.rb'
    Found '1/lib/xz/lib_lzma.rb'
    Found '1/lib/xz/stream_writer.rb'
    

Proto files used are attached.

Alex

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Thu, 2 Feb 2012 06:10:42 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

I’ve changed ::new and ::close to work exactly the same (in fact they’re now aliases) and even accept a filename if you don’t use the block form. Your example should now work, but I’d like to suggest using a case statement instead, like this:

    stream = case fn
    when /\.xz\z/
      XZ::StreamReader.open(fn)
    when /\.gz\z/
      Zlib::GzipReader.open(fn)
    when /\.tar\z/
      File.open(fn)
    else
      raise "Error: Don't know how to handle '#{fn}', aborting"
    end

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

This one caused me some headache... There’s a subtle piece of code in RubyGem’s TarReader class that causes this error:

# rubygems/package/tar_reader.rb, line 69
begin
  # avoid reading...
  @io.seek pending, IO::SEEK_CUR
  pending = 0
rescue Errno::EINVAL, NameError
  while pending > 0 do
    bytes_read = @io.read([pending, 4096].min).size
    raise UnexpectedEOF if @io.eof?
    pending -= bytes_read
  end
end

What this implicitely does is that it expects non-seekable IOs to raise NoMethodError (a subclass of NameError) instead of a proper Errno:: error such as Errno::ESPIPE (the default error raised by io-like for non-seekable objects), otherwise it crashes with what you saw. It may not be an actual bug in RubyGems, but it’s at least bad design. However, I’ve undef-ined the #seek method provided by the io-like gem and at least in my short tests with Gem::Package::TarReader it works now.

Proto files used are attached.

I don’t think GitHub supports attachments. I at least can’t see anything.

Vale,
Quintus

I don’t think GitHub supports attachments. I at least can’t see anything.

Oh yeah, seems like that.

What this implicitely does is that it expects non-seekable IOs to
raise NoMethodError (a subclass of NameError) instead of a
proper Errno:: error such as Errno::ESPIPE (the default error
raised by io-like for non-seekable objects), otherwise it crashes
with what you saw. It may not be an actual bug in RubyGems, but it’s
at least bad design. However, I’ve undef-ined the #seek method
provided by the io-like gem and at least in my short tests with
Gem::Package::TarReader it works now.

Yes, it's strange they are not catching Errno::ESPIPE.

Alex

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Fri, 3 Feb 2012 03:45:17 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

This one caused me some headache... There’s a subtle piece of code in RubyGem’s TarReader class that causes this error:

    # rubygems/package/tar_reader.rb, line 69
    begin
      # avoid reading...
      @io.seek pending, IO::SEEK_CUR
      pending = 0
    rescue Errno::EINVAL, NameError
      while pending > 0 do
        bytes_read = @io.read([pending, 4096].min).size
        raise UnexpectedEOF if @io.eof?
        pending -= bytes_read
      end
    end

What this implicitely does is that it expects non-seekable IOs to raise NoMethodError (a subclass of NameError) instead of a proper Errno:: error such as Errno::ESPIPE (the default error raised by io-like for non-seekable objects), otherwise it crashes with what you saw. It may not be an actual bug in RubyGems, but it’s at least bad design. However, I’ve undef-ined the #seek method provided by the io-like gem and at least in my short tests with Gem::Package::TarReader it works now.

Proto files used are attached.

I don’t think GitHub supports attachments. I at least can’t see anything.

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

If it works for you, can I close this issue now?

Vale,
Quintus

Give me 10 minutes.

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Fri, 3 Feb 2012 06:25:11 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

If it works for you, can I close this issue now?

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

Seems to work, all fine.

----- Original Message -----
From: Marvin Gülker reply@reply.github.com
To: Alex Fortuna alex.r@askit.org
CC:
Time: Fri, 3 Feb 2012 06:25:11 -0800
Subject: [ruby-xz] Stream (IO-like) mode would be great (#2)
Attachments:

If it works for you, can I close this issue now?

Vale,
Quintus


Reply to this email directly or view it on GitHub:
#2 (comment)

Give me 10 minutes.
Seems to work, all fine.

I think this were just around 3 minutes ;-)

OK, thanks for reporting and testing!

Vale,
Quintus