[PR Proposal] Exposing the normalized slice of Quads in a NormalisationAlgorithm
joeltg opened this issue · 1 comments
In addition to the JSON-LD features, we have been using this library's exported Quad
struct as an internal RDF representation (as I imagine others have as well).
One of the things we want to do is 1) normalize an RDF dataset and then 2) access a []*Quad
representation of that normalized dataset, without serializing it to an application/n-quads
string or converting it back to JSON-LD. In particular we want a []*Quad
, not an *RDFDataset
, since the slice of quads preserves the canonical ordering of the quads that *RDFDataset
does not.
Currently there are a few API choices that prevent this kind of usage:
- The
.quads
field of theNormalisationAlgorithm
struct that holds the slice of quads in the dataset is unexported, and there is no exported.Quads(): []*Quad
access method - When the dataset is sorted after normalization, only the
normalized: []string
slice gets sorted, not the.quads: []*Quad
slice (not sorting is correct for the current API since the quads slice is not exported or used afterwards). - The
normalized: []string
slice is always concatenated into a giantapplication/n-quads
string, which is either then returned or re-parsed into an*RDFDataset
, depending onopts.Format
. This step would be unnecessary if users only want to use the[]*Quad
slice, and it would be a shame to re-parse the concatenated string with an n-quads serializer to get it back.
If this is something you'd be open to, I'd be happy to put together a small pull request that refactors the normalization API slightly to support this kind of usage. If not, I totally understand!
The changes I imagine would be:
- Add a
Quads() []*Quad
method to theNormalisationAlgorithm
struct that just returns its.quads
field. - Sort the
normalized []string
andquads []*Quad
slices simultaneously. To do this I imagine moving thenormalized
variable to a field of theNormalisationAlgorithm
struct (and maybe renaming it.terms
instead?), and implementing the sort interface onNormalisationAlgorithm
directly, which would let ussort.Sort(na)
with basically identical performance assort.Strings(normalized)
:
func (na *NormalisationAlgorithm) Len() int { return len(na.normalized) }
func (na *NormalisationAlgorithm) Less(i, j int) bool { return na.normalized[i] < na.normalized[j] }
func (na *NormalisationAlgorithm) Swap(i, j int) {
na.normalized[i], na.normalized[j] = na.normalized[j], na.normalized[i]
na.quads[i], na.quads[j] = na.quads[j], na.quads[i]
}
- Separating step 8 of
na.Main(dataset *RDFDataset, opts *JsonLdOptions) (interface{}, error)
from the rest of the normalization algorithm. To maintain backward-compatibility, this would mean re-naming the rest the method (steps 1 - 7.2) to something likena.Normalize(dataset *RDFDataset)
, so thatMain
would look like:
func (na *NormalisationAlgorithm) Main(dataset *RDFDataset, opts *JsonLdOptions) (interface{}, error) {
// Steps 1 through 7.2, and sorting, happen here
na.Normalize(dataset)
// 8) Return the normalized dataset.
// handle output format
if opts.Format != "" {
if opts.Format == "application/n-quads" || opts.Format == "application/nquads" {
rval := ""
for _, n := range normalized {
rval += n
}
return rval, nil
} else {
return nil, NewJsonLdError(UnknownFormat, opts.Format)
}
}
}
All together, the new normalization API would be perfectly backwards-compatible, and would support a new usage pattern like:
na := ld.NewNormalisationAlgorithm("URDNA2015")
na.Normalize(dataset.(*ld.RDFDataset))
for i, quad := range na.Quads() {
// ...
}
Sorry for the lengthy issue; let me know if this is something you're open to and I'll open a PR! And of course feel free to suggest a different way of approaching it if there's anything I'm not seeing.
Hi @joeltg,
I understand what's missing and agree something needs to be done about it. Your proposal makes sense in principle, but I need to remind myself how the RDF code works and think about Step 2.
On a side note, the current interface of the whole library is, in my opinion, quite poorly designed. This is due to a direct port of the underlying algorithms which, at the time the first version was written, was the most straightforward way to follow the (sometimes very complicated) logic. My plan is, once we have the full support of JSON-LD 1.1 spec, to create a new major version of the library and do a proper overhaul of the interface. With breaking changes. So, from your perspective, once I review your proposal, I'd be grateful to accept a PR. Or we can wait for the next version without having to worry about backward compatibility. I would appreciate if you provide feedback on the new interface once it's ready.
Of course, we can do both: make a change now and then review the new interface. I just wanted to share the plan so that your time and effort don't go to waste.
Stan