Support for remote table schemas?
Closed this issue · 5 comments
CSV on the Web officially specifies that table schemas can be metadata files remotely located on the Web via a URL, see definition from the spec:
tableSchema
: An object property that provides a single schema description as described in section 5.5 Schemas, used as the default for all the tables in the group. This may be provided as an embedded object within the JSON metadata or as a URL reference to a separate JSON object that is a schema description.
Currently, this library doesn't offer this feature.
I can see two possible solutions for now:
- Implement referencing and loading of remote schemas directly within this library.
- Make a screening step and inject any remotely located table schemas into the JSON before feeding it to this library.
Naturally, I would prefer option 1. I'm aware there is the caveat that then validation cannot happen offline. One could imagine caching, though. What do you think on this? Can you imagine this feature being added in principle?
It looks like option 1 could be implemented rather easily, by adding Schema.fromvalue
, along the lines of
def fromvalue(self, v):
if isinstance(v, str):
v = requests.get(v).json()
super().fromvalue(v)
Inlining the remote content could then be done post-hoc, i.e. by
- reading the metadata - thereby retrieving the remote content
- then writing the metadata
Note that with this simple solution, round-tripping wouldn't be possible.
@stschiff just merged the PR implementing this. Do you need a release on PyPI with this functionality or can you work from HEAD for the time being?
Thanks, that's great! I can work from HEAD for now. And thank for inviting me to contribute. I'll work through PRs and Issues of course, should I have something.