simonw/datasette-lite

Character `%2F` is automatically converted to `/` in URL param

severo opened this issue · 1 comments

When passing a parquet URL that contains the character %2F, it seems that the character is considered as /, which convert the original URL to a different one.

See, for example, the file: https://huggingface.co/datasets/squad/resolve/refs%2Fconvert%2Fparquet/plain_text/squad-train.parquet.

The app gives the following error:

Error

Traceback (most recent call last):
  File "/lib/python311.zip/_pyodide/_base.py", line 540, in eval_code_async
    await CodeRunner(
  File "/lib/python311.zip/_pyodide/_base.py", line 365, in run_async
    await coroutine
  File "<exec>", line 110, in <module>
  File "/lib/python311.zip/pyodide/http.py", line 201, in bytes
    self._raise_if_failed()
  File "/lib/python311.zip/pyodide/http.py", line 125, in _raise_if_failed
    raise OSError(
OSError: Request for https://huggingface.co/datasets/squad/resolve/refs/convert/parquet/plain_text/squad-train.parquet failed with status 404: Not Found

in these two cases:

cc @julien-c

severo commented

Note, in case somebody looks at this issue to load a HuggingFace dataset with lite.datasette.io, we now provide a simpler API to access the parquet files:

https://lite.datasette.io/?parquet=https://huggingface.co/api/datasets/glue/parquet/ax/test/0.parquet

It does not contain %2F this time 😄