fatiando/pooch

Registry files cannot handle file names containing spaces

dokempf opened this issue · 3 comments

Description of the problem:

The way that registry.txt files are parsed here does not allow filenames to contain spaces. This could be fixed by implementing advanced parsing that allows escaping using enclosing quotes or backslashes.

Full code that generated the error

With registry.txt in path:

import pooch
P = pooch.create(
    path=pooch.os_cache("test"),
    base_url="ignored"
)
P.load_registry("registry.txt")

Full error message

OSError: Invalid entry in Pooch registry file './registry.txt': expected 2 or 3 elements in line 1 but got 7. Offending entry: '"How To - Minimal Workflow.pdf" sha256:7d4da5fe4d0438437834fe846608b93959dcb80ded941eb85663b2d905c54000 https://heidata.uni-heidelberg.de/api/access/datafile/:persistentId?persistentId=doi:10.11588/data/TJNQZG/DWSXML'

@dokempf thanks for reporting this as well! Indeed our parsing is very simplistic and doesn't cover this. In hindsight we should have done with a standard format like JSON for the registry file which would have covered all of this. With the current format, you're right that the only way is to improve our parsing to handle quotes and escaping.

Would you or anyone else like to implement this?

I will do this or ask a student assistant to do so. Will probably use shlex.

Awesome! And I didn't know about shlex 🤩