Add Range header to GET request when used as a fallback in perform_head_request

Question

Add Range header to GET request when used as a fallback in perform_head_request

kylefleming opened this issue 6 years ago · 1 comments

I was wondering if it would make sense to add a Range header to the GET request in HTTP.perform_head_request.

Background

The GET request is being used as a fallback if the HEAD request returns a failure code, since some hosts have inconsistencies between HEAD and GET (for example, returning 200 OK for GET and 403 Forbidden for HEAD).

trunk.cocoapods.org is using the HTTP.perform_head_request when it calls out to HTTP.validate_url in the check for SpecificationWrapper.validate_http. The SpecificationWrapper.validate_http also includes a timeout wrapper for 5 seconds.

Problem description

If both the HEAD request fails and the requested file is large enough to take longer than 5 seconds, then the file http is not considered valid, even though under normal circumstances it would be.

Proposed solution

One solution to this would be to add a Range header to the GET request so that the server only returns a small portion of the requested file, thus not tripping the timeout wrapper that trunk.cocoapods.org is using.

Possible issues

Some servers won't support Range requests, but the use of GET in this case is to test for the existences, so we don't need the whole file and we don't need the ability to resume downloading (my impression is that this would normally be the main reason for using a Range request). If Range is not supported, the server will likely either ignore the header and return the response it otherwise would have sent, or the server refuses to proceed. The edge case here would be that the server returns a failure code because of the Range request even though it might have otherwise returned a success code. In this case there might need to be a third fallback that doesn't include the Range request. I'm not sure how common this would be.

Example code

An example of using the header would be the following (I'm using the files mentioned in this bug report for the example):

HEAD request fails:

$ curl --location --head https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip
...
HTTP/1.1 403 Forbidden
x-amz-request-id: C90A15FD69DC2813
x-amz-id-2: 6DORgM622YIDftqj4ttJIx5qnuCU0Rrj3fAKqSLKU40jKYC8hFq/ejVPF/HUjl+O
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 12 Jan 2018 03:05:13 GMT
Server: AmazonS3

GET request takes more than 5 seconds (22 seconds on my machine):

$ curl --location https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip > opencv-3.3.0-ios-osx-framework.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   624    0   624    0     0    624      0 --:--:-- --:--:-- --:--:--  2526
100  231M  100  231M    0     0  10.5M      0  0:00:22  0:00:22 --:--:-- 9418k

The proposed added step of using a GET request with a Range header (this example requests the first byte of the content):

$ curl --location --header "Range: bytes=0-0" https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip > opencv-3.3.0-ios-osx-framework.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   624    0   624    0     0    624      0 --:--:-- --:--:-- --:--:--  2678
100     1  100     1    0     0      1      0  0:00:01 --:--:--  0:00:01  1000

Summary:

The ultimate goal here would be to allow for a GET fallback that more easily supports larger files when calling HTTP.perform_head_request by not requesting the entire file.

Answer 1 · 2018-01-12T17:12:05.000Z

Sounds reasonable to me!