Zlib::GzipReader#readpartial until eof? causes EOFError
martinemde opened this issue · 4 comments
I'm using the data.tar.gz from ruby-progressbar-1.13.0.gem, which seems to trigger this error. If I take exactly the same data.tar.gz, gunzip, and re-gzip it, then this bug doesn't happen. However, the gem unpacks fine, it's just readpartial
that breaks.
$ gem fetch ruby-progressbar -v 1.13.0
$ tar zxf ruby-progressbar-1.13.0.gem data.tar.gz
irb> io = File.open('data.tar.gz')
=> #<File:data.tar.gz>
irb> io.size
=> 10250
irb> Zlib::GzipReader.wrap(io) { |gzio| gzio.readpartial(16_384) until gzio.eof? } # rubygems does this, but with #read
Traceback (most recent call last):
7: from /usr/bin/irb:23:in `<main>'
6: from /usr/bin/irb:23:in `load'
5: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
4: from (irb):9
3: from (irb):9:in `wrap'
2: from (irb):9:in `block in irb_binding'
1: from (irb):9:in `readpartial'
EOFError (end of file reached)
I have done some digging to verify I didn't cause this. It seems to require that you use this data.tar.gz that has a conflict with how zlib uses readpartial. Other data.tar.gz files work, and this same data.tar file re-gzipped works.
I think the EOF is raised here because it reaches the end of the file during gzfile_read_more(gz, outbuf);
and then checks if it's at the end before finalizing the unzip. https://github.com/ruby/zlib/blob/master/ext/zlib/zlib.c#L2933
This does seem like it could be fixed. Using gzio.read(16_384)
instead of gzio.readpartial
works.
Edit2: The error is real but my understanding of it and why it happened was wrong. Leaving this open with changed title until the fix is merged.
This bug blocks rubygems from using readpartial to read gems.
I'm running into a similar issue on jruby. It seems that readpartial on a gzipreader wrapped io does not always read as many bytes as you ask it to. This can cause the io.pos to be shifted, starting at the wrong position to line up with the tar header. This happens only if I use readpartial to move through the gzio, but not if I use gzio.read
.
I think that readpartial
isnt guaranteed to read exactly len
bytes.
From the IO docs: Reads up to maxlen bytes from the stream
also:
When blocked, the method waits for either more data or EOF on the stream:
If more data is read, the method returns the data.
If EOF is reached, the method raises [EOFError](dfile:///Users/segiddins/Library/Application%20Support/Dash/DocSets/Ruby_3/Ruby.docset/Contents/Resources/Documents/EOFError.html).
When not blocked, the method responds immediately:
Returns data from the buffer if there is any.
Otherwise returns data from the stream if there is any.
Otherwise raises [EOFError](dfile:///Users/segiddins/Library/Application%20Support/Dash/DocSets/Ruby_3/Ruby.docset/Contents/Resources/Documents/EOFError.html) if the stream has reached EOF.
So it seems like an EOFError
should only be raise if no data was read, which I think is the opposite of the implementation today.
I think I have a fix though, and cutting down on the number of calls needed in a readpartial until eof?
loop also seems like a good thing?
I spent more time looking into this.
I ran gunzip -c test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz | gzip -9 - >test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz
, then diffed the hexdumps of the two files:
❯ diff -U5 --label orig <(hexdump -C test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz) --label new <(hexdump -C test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz)
--- orig
+++ new
@@ -1,7 +1,7 @@
-00000000 1f 8b 08 00 c7 e8 02 64 02 03 ec 3d 7d 7f da 38 |.......d...=}..8|
+00000000 1f 8b 08 00 47 b5 83 65 02 03 ed 3d 7d 7f da 38 |....G..e...=}..8|
00000010 d2 fb 37 9f 42 4d 9e 2e 90 25 06 f2 d6 3d b6 34 |..7.BM...%...=.4|
00000020 4b 12 da 72 9b 84 fc 80 6c af 4f 36 3f d7 80 49 |K..r....l.O6?..I|
00000030 7c 35 36 67 9b a4 d9 a6 cf 67 7f 66 f4 62 4b 7e ||56g.....g.f.bK~|
00000040 83 74 d3 f6 ae 07 db 4d 82 ad 19 8d 46 d2 68 66 |.t.....M....F.hf|
00000050 34 1a 1d 77 0e db a7 fd b6 16 7c 08 7e f8 52 9f |4..w......|.~.R.|
@@ -636,8 +636,8 @@
000027a0 4c 00 36 75 c7 a5 ed bd 1a cf af 99 72 09 5a ac |L.6u........r.Z.|
000027b0 ec 5e 8d fb 22 97 b8 d7 6c e9 d5 75 65 48 7d 4f |.^.."...l..ueH}O|
000027c0 eb ff 8d e9 61 bc ec 63 69 00 0b f7 7f f7 b6 e2 |....a..ci.......|
000027d0 eb ff 4e 7d 95 ff e5 2b fa ff e4 c5 9d 90 df db |..N}...+........|
000027e0 bd 3e 0b ce 2a d6 b5 fa b6 56 0b b7 59 56 53 7d |.>..*....V..YVS}|
-000027f0 f5 59 7d 56 9f d5 e7 7b f9 fc 3f 00 00 00 ff ff |.Y}V...{..?.....|
-00002800 03 00 19 b6 3b 4c 00 ea 00 00 |....;L....|
-0000280a
+000027f0 f5 59 7d 56 9f d5 e7 7b f9 fc 3f 19 b6 3b 4c 00 |.Y}V...{..?..;L.|
+00002800 ea 00 00 |...|
+00002803
What's interesting here is, other than one byte at the start of the deflate stream, the only diff is at the end.
So then I ran https://github.com/madler/infgen, and diffed that:
❯ diff -U5 --label orig <(~/Development/github.com/madler/infgen/a.out test/zlib/fixtures/ruby-progressbar-1.13.0.gem.data.tar.gz) --label new <(~/Development/github.com/madler/infgen/a.out test/zlib/fixtures/ruby-progressbar-1.13.0.rezip.gem.data.tar.gz)
--- orig
+++ new
@@ -1,9 +1,10 @@
! infgen 3.2 output
!
gzip
!
+last
dynamic
litlen 0 10
litlen 10 7
litlen 32 5
litlen 33 9
@@ -4275,16 +4276,9 @@
match 258 1
match 258 1
match 258 1
match 258 1
match 200 1
-end
-!
-stored
-end
-!
-last
-fixed
end
!
crc
length
and this shows us what's happening here.
The original file has 3 different deflate blocks in the stream: the main block, followed by an empty block, followed by a final empty block.
Because our Zlib
implementation keeps reading until the output buffer has something in it, it will read through those final 2 empty blocks, try to read again, and then EOFError because finally the stream is (actually) at its end.
I don't know what the fix here should be, other than changing gzfile_read_more
to break unconditionally (even when ZSTREAM_BUF_FILLED(&gz->z) == 0
), and removing the ZSTREAM_BUF_FILLED
check from readpartial as well.
Minimized test case:
#!/usr/bin/env ruby -w
require 'stringio'
require 'zlib'
contents = "a" * 2030
zd = Zlib::Deflate.new(Zlib::NO_COMPRESSION)
header = "\x1f\x8b\x08\x00\xc7\xe8\x02\x64\x02\x03".b
s = "".b
s << zd.deflate(contents)
s << zd.flush(Zlib::SYNC_FLUSH)
s << zd.finish
zd.close
s = header << s[2..-5] << [Zlib.crc32(contents)].pack('l<*') << [contents.size].pack('l<*')
read = "".b
Zlib::GzipReader.wrap(StringIO.new(s)) do |gzio|
until gzio.eof?
part = gzio.readpartial(1024) # RAISES
read << part
end
end
pp read == contents