openml/OpenML

CORS policy

Closed this issue · 21 comments

saehm commented

Hello,

I would like to use OpenML in a JavaScript library, which should work in the browser and in nodejs.
So, far i have no problems in fetching data using the REST API in node.js. When i want to fetch data in the browser, the browser blocks the request, because the response header does not contain a "Access-Control-Allow-Origin" header.

Is there a problem on my side, do i have to add something to the request header?
Is it on purpose to permit fetching data from a script running in a browser from your side?

I am using node-fetch in the node environment, and the default javascript fetch in the browser environment.

saehm commented

According to https://enable-cors.org/server_apache.html, it seems to be just a little change in the right .htaccess file.
The response header of an API call should contain a "Access-Control-Allow-Origin" to be accepted by a browser, which it does not at the moment.

In example:
curl -v https://api.openml.org/api/v1/json/data/187
returns this response:

< HTTP/1.1 200 OK
< Date: Thu, 18 Aug 2022 07:56:19 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Set-Cookie: ci_session=ko27fr7pguc027468ffe5b3ihtld68lj; expires=Thu, 18-Aug-2022 09:56:19 GMT; Max-Age=7200; path=/; HttpOnly
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate
< Pragma: no-cache
< Content-Length: 4234
< Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
< Content-Type: application/json; charset=utf-8
< 
{"data_set_description":{"id":"187","name":"wine","version":"1","description":"**Author**:   \n**Source**: Unknown -   \n**Please cite**:   \n\n1. Title of Database: Wine recognit...

Can you try again, please?

saehm commented

It works now :)
Thank you!

@joaquinvanschoren not sure if that is easily doable but would you consider adding a Access-Control-Allow-Origin: * header to the https://openml.org/api/* URLs as well?

Right now it does not have the header e.g. this URL:

curl -v https://openml.org/api/v1/json/data/list/data_name/titanic/limit/2/data_version/1 2>&1 | grep Access-Control-Allow-Origin     

While the api.openml.org equivalent URL has the Access-Control-Allow-Origin header:

curl -v https://api.openml.org/v1/json/data/list/data_name/titanic/limit/2/data_version/1 2>&1 | grep Access-Control-Allow-Origin 
< Access-Control-Allow-Origin: *

Context: turns out scikit-learn uses https://openml.org/api URLs (not sure if https://api.openml.org URLs are preferred, let me know if this is the case ...) and trying to use sklearn.datasets.fetch_openml inside Pyodide fails with a CORS-related error. It would be great if https://openml.org/api URLs could have the header. This would allow running scikit-learn gallery examples inside JupyterLite, see scikit-learn/scikit-learn#25887 for more details.

Yes, https://api.openml.org is preferred (and it will be faster), but I'll try to add the headers.
We have a proxy set up that redirects openml.org/api to api.openml.org but for some reason it strips the headers and I'm not sure why yet.

Ok, it should work now. Please let me know :)

Thanks a lot! It seems to fix the issue with most URLs, although
I am still seeing a CORS issue with data URLs e.g. https://openml.org/data/v1/download/16826755. Not sure why since there seems to be Access-Control-Allow-Origin: * in the headers ...

On the other hand, the equivalent api.openml.org does not have the CORS issue https://api.openml.org/data/v1/download/16826755.

For the longer term, I'll try to get scikit-learn to use the api.openml.org URLs.

To reproduce go to https://scikit-learn.org (most websites would do for the reproducer), open your browser console:

The api.openml.org/data succeeds:

function reqListener() {
  console.log(this.responseText);
}

req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://api.openml.org/data/v1/download/16826755");
req.send();

The openml.org/data one does not:

function reqListener() {
  console.log(this.responseText);
}

req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://openml.org/data/v1/download/16826755");
req.send();

It looks like somehow there are multiple CORS headers.

The error looks like this on Firefox:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at 
https://www.openml.org/data/v1/download/16826755. (Reason: Multiple CORS header
 ‘Access-Control-Allow-Origin’ not allowed).

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at 
https://www.openml.org/data/v1/download/16826755. (Reason: CORS request did not succeed).
 Status code: (null).

And on Chromium:

Access to XMLHttpRequest at 'https://www.openml.org/data/v1/download/16826755' (redirected from 
'https://openml.org/data/v1/download/16826755') from origin 'https://scikit-learn.org' has been blocked 
by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values '*, *', but only one 
is allowed.
VM36:8     GET https://www.openml.org/data/v1/download/16826755 net::ERR_FAILED 307 (Temporary Redirect)
(anony

I looked a bit more at the failing snippet it via the Chromium developer tools and it does seem like the openml.org/data has a single Access-Control-Allow-Origin header
image

while the www.openml.org/data has two:
image

So maybe the redirection from openml.org/data to www.openml.org adds an unnecessary header. Random guess (complete newbie in this kind of thing), maybe somewhere there is a Header append (or maybe add) instead of Header set?

Actually the issue does not seem related to redirection, only to https://www.openml.org/data as can be seen from the snippet:

function reqListener() {
  console.log(this.responseText);
}

req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://www.openml.org/data/v1/download/16826755");
req.send();

XMLHttpRequest seems more picky and complains about having twice the same Access-Control-Allow-Origin header while typing the URL in a browser or downloading via wget is more forgiving ...

Thanks!
I updated the configuration. Is it better now?

Hmmm still not quite, here is the status:

  • https://www.openml.org/data/v1/download/16826755 works: it has a single Access-Control-Allow-Origin: * header
  • but https://openml.org/data/v1/download/16826755 does not work, it has no Access-Control-Allow-Origin: * header. Unfortunately, this is the one we are using in scikit-learn.

How are you testing? I do see the header in Chromium

Screenshot 2023-04-18 at 11 55 47

I go to scikit-learn.org, open a browser console and type the following snippet and then Enter:

function reqListener() {
  console.log(this.responseText);
}

req = new XMLHttpRequest();
req.addEventListener("load", reqListener);
req.open("GET", "https://openml.org/data/v1/download/16826755");
req.send();

This is what I get:
image

Ok, I figured it out I think. Is it working on your end now?

Now I get a failure for both https://openml.org/data/v1/download/16826755 https://www.openml.org/data/v1/download/16826755 because they have two Access-Control-Allow-Origin: *

Access to XMLHttpRequest at 'https://openml.org/data/v1/download/16826755' from origin 'https://scikit-learn.org'
has been blocked by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values '*, *',
but only one is allowed.

Of course :)
How about now?

Works fine now, thanks a lot for this!

Pwiew :) Thanks for the quick feedback. Feel free to close the issue.

Feel free to close the issue.

I would if I could, but I am not the one who opened it 😉

Ah! No worries, I'll close it then. Good luck with the scikit-learn gallery examples!