Add Range header to GET request when used as a fallback in perform_head_request
kylefleming opened this issue · 1 comments
I was wondering if it would make sense to add a Range
header to the GET request in HTTP.perform_head_request
.
Background
The GET request is being used as a fallback if the HEAD request returns a failure code, since some hosts have inconsistencies between HEAD and GET (for example, returning 200 OK for GET and 403 Forbidden for HEAD).
trunk.cocoapods.org is using the HTTP.perform_head_request
when it calls out to HTTP.validate_url
in the check for SpecificationWrapper.validate_http
. The SpecificationWrapper.validate_http
also includes a timeout wrapper for 5 seconds.
Problem description
If both the HEAD request fails and the requested file is large enough to take longer than 5 seconds, then the file http is not considered valid, even though under normal circumstances it would be.
Proposed solution
One solution to this would be to add a Range
header to the GET request so that the server only returns a small portion of the requested file, thus not tripping the timeout wrapper that trunk.cocoapods.org is using.
Possible issues
Some servers won't support Range requests, but the use of GET in this case is to test for the existences, so we don't need the whole file and we don't need the ability to resume downloading (my impression is that this would normally be the main reason for using a Range request). If Range is not supported, the server will likely either ignore the header and return the response it otherwise would have sent, or the server refuses to proceed. The edge case here would be that the server returns a failure code because of the Range request even though it might have otherwise returned a success code. In this case there might need to be a third fallback that doesn't include the Range request. I'm not sure how common this would be.
Example code
An example of using the header would be the following (I'm using the files mentioned in this bug report for the example):
HEAD request fails:
$ curl --location --head https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip
...
HTTP/1.1 403 Forbidden
x-amz-request-id: C90A15FD69DC2813
x-amz-id-2: 6DORgM622YIDftqj4ttJIx5qnuCU0Rrj3fAKqSLKU40jKYC8hFq/ejVPF/HUjl+O
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 12 Jan 2018 03:05:13 GMT
Server: AmazonS3
GET request takes more than 5 seconds (22 seconds on my machine):
$ curl --location https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip > opencv-3.3.0-ios-osx-framework.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 624 0 624 0 0 624 0 --:--:-- --:--:-- --:--:-- 2526
100 231M 100 231M 0 0 10.5M 0 0:00:22 0:00:22 --:--:-- 9418k
The proposed added step of using a GET request with a Range header (this example requests the first byte of the content):
$ curl --location --header "Range: bytes=0-0" https://github.com/kylefleming/opencv/releases/download/3.3.0/opencv-3.3.0-ios-osx-framework.zip > opencv-3.3.0-ios-osx-framework.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 624 0 624 0 0 624 0 --:--:-- --:--:-- --:--:-- 2678
100 1 100 1 0 0 1 0 0:00:01 --:--:-- 0:00:01 1000
Summary:
The ultimate goal here would be to allow for a GET fallback that more easily supports larger files when calling HTTP.perform_head_request
by not requesting the entire file.
Sounds reasonable to me!