socketry/async-http

Possible bug: Downloaded file differs with that of Firefox

marek22k opened this issue · 16 comments

Hello,

I downloaded the same file with Firefox and Async, but the one from Async is 0.1MB bigger and invalid (is a bz2 file). What is the reason for this?
Files:
files.zip
Code:

require 'logger'
require 'async'
require 'async/barrier'
require 'async/http/internet'
require_relative 'helper/config'

config = Configuration::Configuration.new.validate!
logger = config.create_logger

mrts = [config.table4_file, config.table6_file]

Async do
  internet = Async::HTTP::Internet.new
  barrier = Async::Barrier.new
  
  mrts.each do |mrt|
    barrier.async do
      begin
        logger.info "Download #{mrt}."
        response = internet.get("#{config.collector}/#{mrt}")

        case response.status
        when 200
          logger.info "Download of #{mrt} complete."
          response.save(mrt)
          logger.info "#{mrt} successful saved."
        else
          raise "Failed to download #{mrt}."
        end
      rescue RuntimeError => e
        logger.error(e.message)
      end
    end
  end
ensure
  internet.close
end

Can you give me the two URLs?

https://mrt.collector.dn42/master4_latest.mrt.bz2 But I think without access to the dn42/NeoNetwork/CRXN/Freifunk/ChaosVPN it might be difficult to download anything there.
Screenshot at 2023-06-02 15-32-45

Is there a host I can SSH into for testing?

samuel@sakura ~/D/files> bunzip2 async.mrt.bz2 

bunzip2: async.mrt.bz2: trailing garbage after EOF ignored
samuel@sakura ~/D/files> bunzip2 firefox.mrt.bz2 
samuel@sakura ~/D/files> ls -lah
total 255472
drwx------@  4 samuel  staff   128B  2 Jun 23:02 ./
drwxr-xr-x@ 36 samuel  staff   1.1K  2 Jun 22:59 ../
-rw-------   1 samuel  staff    62M  2 Jun 13:19 async.mrt
-rw-r--r--   1 samuel  staff    62M  2 Jun 13:19 firefox.mrt
samuel@sakura ~/D/files> shasum async.mrt 
0cb78bff3b4239cb8745407c4bc16f596540f654  async.mrt
samuel@sakura ~/D/files> shasum firefox.mrt 
0cb78bff3b4239cb8745407c4bc16f596540f654  firefox.mrt

The statement bunzip2: async.mrt.bz2: trailing garbage after EOF ignored is worrying.. let me check how that could be happening.

If you want me to run a modified version of the code, feel free to tell me.

As far as I know there is no SSH host or similar - the only thing I could offer is to provide you with a WireGuard tunnel (PM https://mk16.de/about_me/)

I am going to try and repro locally.

Firstly, can you run it with CONSOLE_LEVEL=debug ... and give me the output.

The statement bunzip2: async.mrt.bz2: trailing garbage after EOF ignored is worrying.. let me check how that could be happening.

Mhh, that might explain why another implementation, can read the file anyway. To extract the MRT file I use bzip2-ffi - maybe bzip2-ffi can't handle it.

Firstly, can you run it with CONSOLE_LEVEL=debug ... and give me the output.

I, [2023-06-02T16:09:50.703326 #9879]  INFO -- : Download master4_latest.mrt.bz2.
  0.0s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | No available resources, allocating new one...
I, [2023-06-02T16:09:50.703650 #9879]  INFO -- : Download master6_latest.mrt.bz2.
  0.0s    debug: Async::HTTP::Client [oid=0x334] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Making connection to #<Async::HTTP::Endpoint https://mrt.collector.dn42/master4_latest.mrt.bz2 {}>
 0.06s    debug: Async::IO::Socket [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Connecting to #<Addrinfo: 172.20.129.7:443 TCP (mrt.collector.dn42)>
 0.14s    debug: Async::HTTP::Protocol::HTTPS [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Negotiating protocol "h2"...
 0.18s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x35c] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Wait for resource -> #<Async::HTTP::Protocol::HTTP2::Client 0 active streams>
 0.18s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Wait for resource -> #<Async::HTTP::Protocol::HTTP2::Client 1 active streams>
I, [2023-06-02T16:09:50.975347 #9879]  INFO -- : Download of master6_latest.mrt.bz2 complete.
I, [2023-06-02T16:09:51.031937 #9879]  INFO -- : Download of master4_latest.mrt.bz2 complete.
 1.55s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:52 +0200]
               | Reuse #<Async::HTTP::Protocol::HTTP2::Client 1 active streams>
I, [2023-06-02T16:09:52.253842 #9879]  INFO -- : master4_latest.mrt.bz2 successful saved.
 2.48s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x35c] [pid=9879] [2023-06-02 16:09:53 +0200]
               | Reuse #<Async::HTTP::Protocol::HTTP2::Client 0 active streams>
I, [2023-06-02T16:09:53.187483 #9879]  INFO -- : master6_latest.mrt.bz2 successful saved.
 2.48s    debug: Async::Reactor [oid=0x370] [ec=0x384] [pid=9879] [2023-06-02 16:09:53 +0200]
               | Exiting run-loop because finished.

It also works with the native Ruby HTTP client. (though I would prefer to use async as I would then have it consistent). https://codeberg.org/mark22k/BgpLookup/src/commit/02bddd5e8ffe5920e9573a434ef8d03791b3a827/get_mrts.rb

Native Ruby client only supports HTTP/1 which is probably the big difference.

          logger.info "Download of #{mrt} complete."

          Console.logger.info(response, body: response.body.inspect, length: response.body.length, headers: response.headers.to_h)

Can you add that above log line and let me know the output.

I might have figured it out...

Can you update protocol-http and check if the issue still exists?

As far as I can tell, everything is working fine again now. Thanks a lot!

Thanks for your bug report.