pypi/conveyor

Allow cross origin range requests

hoodmane opened this issue · 11 comments

This is a continuation of #5 (also from pyodide). For some reason the CORS headers are not included when making a range request. If I request:

let resp = await fetch("https://files.pythonhosted.org/packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz", {
   'mode': 'cors',
   headers : {
      'range' : "bytes:0-1000",
   }
});

this raises an error:
Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Using curl, I see no CORS headers when I make a range request:

>>> curl -IXGET https://files.pythonhosted.org/packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz -r 0-1000
etag: "22e3726252b492ce24312c2b43d0127f"
content-type: binary/octet-stream
...
accept-ranges: bytes
age: 690655
content-range: bytes 0-1000/1334822
date: Wed, 02 Jun 2021 04:17:20 GMT
...
content-length: 1001

It's odd though because the CORS headers are present if I drop the range.

The reason I am interested in this is because range requests seem to be part of the way that pip handles dependency resolution. In particular, see the docstring here:
https://github.com/pypa/pip/blob/57be6a77c57ab5d512371b5c48d508a7620c3217/src/pip/_internal/network/lazy_wheel.py#L23

Thanks!

Actually, I'm still having trouble though. Now when I curl -IXGET -r 0-1000, I see:

content-type: binary/octet-stream
access-control-allow-methods: GET
access-control-allow-origin: *
content-length: 1001

so it sure looks like the server set the headers appropriately. Without the range request I see headers that look similar (obviously with a longer content-length), including the access-control-allow-origin: * .

However, when I make the same range request with fetch in browser I see:

content-type: text/html; charset=UTF-8
accept-ranges: bytes
content-length: 0

and no access-control-allow-origin: *. Without the range request I see the same headers that curl -IXGET reports... I wonder if I am doing something wrong here.

I guess I should experiment with running my own server and see if I can get it to work.

di commented

Can you show us how you're making this request from the browser so we can reproduce?

I am going to about:blank and then pasting this into the browser console:

resp = await fetch(
	"https://files.pythonhosted.org/packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz", 
    {
        'mode': 'cors',
        headers : {
        'range' : "bytes:0-1000",
        }
    }
);

Here is what I see in Chrome:
image

And in Firefox:
image

Thanks so much for your help @di!

Here is a range request that works for me:

url = "https://upload.wikimedia.org/wikipedia/commons/b/be/Hidden_Tribe_-_Didgeridoo_1_Live.ogg";
headers = new Headers();
range = 1024;
headers.append("Range", "bytes=0-" + range);
request = new Request(url, {headers:headers});
await fetch(request);

For this url curl -IXGET reports some slightly different looking headers:

access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
timing-allow-origin: *

and no access-control-allow-methods: GET. Maybe some subset of these other headers is needed?

Maybe the problem is x-permitted-cross-domain-policies: none?

di commented

I think it's likely because we're not including these headers in the CORS preflight request.

PyPI:

$ curl -H "Origin: http://example.com" -H "Access-Control-Request-Method: POST" -H "Access-Control-Request-Headers: X-Requested-With" -X OPTIONS --verbose https://files.pythonhosted.org/packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz
*   Trying 151.101.209.63...
* TCP_NODELAY set
* Connected to files.pythonhosted.org (151.101.209.63) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.pythonhosted.org
*  start date: Mar 22 19:18:08 2021 GMT
*  expire date: Apr 23 19:18:07 2022 GMT
*  subjectAltName: host "files.pythonhosted.org" matched cert's "*.pythonhosted.org"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2020
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fad7880f600)
> OPTIONS /packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz HTTP/2
> Host: files.pythonhosted.org
> User-Agent: curl/7.64.1
> Accept: */*
> Origin: http://example.com
> Access-Control-Request-Method: POST
> Access-Control-Request-Headers: X-Requested-With
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200
< server: UploadServer
< content-type: text/html; charset=UTF-8
< accept-ranges: bytes
< cache-control: max-age=365000000, immutable, public
< date: Mon, 26 Jul 2021 13:21:40 GMT
< x-served-by: cache-sea4442-SEA, cache-ewr18139-EWR
< x-cache: MISS, MISS
< x-cache-hits: 0, 0
< x-timer: S1627305701.610647,VS0,VE112
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< x-frame-options: deny
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< x-permitted-cross-domain-policies: none
< x-robots-header: noindex
< content-length: 0
<
* Connection #0 to host files.pythonhosted.org left intact
* Closing connection 0

Wikimedia:

$ curl -H "Origin: http://example.com" -H "Access-Control-Request-Method: POST" -H "Access-Control-Request-Headers: X-Requested-With" -X OPTIONS --verbose https://upload.wikimedia.org/wikipedia/commons/b/be/Hidden_Tribe_-_Didgeridoo_1_Live.ogg
*   Trying 208.80.154.240...
* TCP_NODELAY set
* Connected to upload.wikimedia.org (208.80.154.240) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.wikipedia.org
*  start date: Jul 15 08:01:49 2021 GMT
*  expire date: Oct 13 08:01:48 2021 GMT
*  subjectAltName: host "upload.wikimedia.org" matched cert's "*.wikimedia.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7ff35e00f600)
> OPTIONS /wikipedia/commons/b/be/Hidden_Tribe_-_Didgeridoo_1_Live.ogg HTTP/2
> Host: upload.wikimedia.org
> User-Agent: curl/7.64.1
> Accept: */*
> Origin: http://example.com
> Access-Control-Request-Method: POST
> Access-Control-Request-Headers: X-Requested-With
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200
< date: Mon, 26 Jul 2021 13:20:46 GMT
< server: Varnish
< x-cache: cp1076 int
< x-cache-status: int-front
< server-timing: cache;desc="int-front", host;desc="cp1076"
< strict-transport-security: max-age=106384710; includeSubDomains; preload
< report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
< nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
< permissions-policy: interest-cohort=()
< set-cookie: WMF-Last-Access=26-Jul-2021;Path=/;HttpOnly;secure;Expires=Fri, 27 Aug 2021 12:00:00 GMT
< x-client-ip: 100.34.218.159
< access-control-allow-origin: *
< access-control-allow-headers: Range,X-Wikimedia-Debug
< access-control-allow-methods: GET, HEAD, OPTIONS
< access-control-max-age: 86400
< content-length: 0
< accept-ranges: bytes
<
* Connection #0 to host upload.wikimedia.org left intact
* Closing connection 0
di commented

Updates in pypi/infra#71 should address this

I agree, it seems to work. Now we can copy LazyWheel into Pyodide. =) Thanks again!