Consistent Select/Query interface with xmlquery
Maldris opened this issue · 7 comments
In the sibling xmlquery package, the Node struct has public methods;
SelectElement(name string) *Node
and
SelectElements(name string []*Node
jsonquery by comparison only has SelectElement.
Given both packages export the global function Find
and FindOne
both can obviously be implemented for JSON, as they already are.
I propose a change where jsonquery's SelectElement
implementation mimics xmlquery's and internally calls FindOne
, and similarly add SelectElements
that internally calls Find
.
To this end, both node types can be identified and used with an interface of
type Selector interface {
InnerText() string
SelectElement(name string) Selector
SelectElements(name string) []Selector
}
Allowing basic manipulation of both file types to only differ during load.
The alternative to this being using Query
and QueryAll
to allow errors to be returned, instead of simply returning a nil result/panicking, or implementing both sets as methods in each package and allowing users to make the selection.
Is this proposal sound and within the intended scope of this project?
If so I will make the changes and a pull request.
My main concern with this change is that it changes the behaviour of jsonquery's SelectElement
which currently only looks at immediate children with a matching field name, instead of running a full query that can look at the full child tree as xmlquery does.
As a result, this would be a breaking change. However, it would mean a consistent interface, and more predictable behaviour compared to the other query and select functions which all run a full query.
I forget why no SelectElements
method. May be is JSON doesn't support Duplidate Keys.
the below is incorrect json structure.
{
"aa": "sss",
"aa": "sss"
}
In json the SelectElements
is doesn't make sense, because we don't have same keys in the childs element. Let me know if I'm wrong. Thanks
TLDR:
while yes a query like //aa
for your example json doesn't make sense, a query like //array/elements
, //name
or //cars//name
on the example below do, and are already implemented at the package level. I'd like to add them as methods so that there is a common interface (and I am currently testing this on a fork).
The main issue with this being doing so changes SelectElement
from only looking at siblings and immediate children to the whole subtree.
My main use case consideration was twofold;
A work project involves a large configuration driven file processor, historically it worked only with xml files, but now adding JSON files. Having an abstract interface for both would mean the logic remains the same, but with different loaders, making the whole application simpler than putting type switches everywhere.
To achieve this, the main case was queries like getting the elements of an array, or getting the presence of a keyword in sibling object properties.
Such as for the test json:
{
"name":"John",
"age":30,
"cars": [
{ "name":"Ford", "models":[ "Fiesta", "Focus", "Mustang" ] },
{ "name":"BMW", "models":[ "320", "X3", "X5" ] },
{ "name":"Fiat", "models":[ "500", "Panda" ] }
]
}
running the query //cars//name
to find all the car names, or //cars/element
to get each array element so that the system can then range over those lists and run configured operations on each.
Given that Find/FindOne or Query/QueryAll are already implemented at the package level, and for xmlquery the methods just call them (passing the subject node as top) I'm unsure why the same pattern wasn't taken for this package?
yes certain classes of query such as the example you gave aren't valid, but other ones like the above example are.
I've forked both packages and implemented a common interface, I'm currently looking at improving the test coverage in jsonquery, then I'll be doing a test in a real system to make sure it works as intended with real documents rather than simple unit tests.
*Edited some typos
You are right, I forgot the Array[]
type in JSON. SelectElements
can works for array type.
I'll be doing some testing tomorrow with the forked packages, so I'll report back if its working as intended, then we can make decisions on how to handle the breaking change to SelectElement
and whether to merge.
ok, mixed bag results, I created the interface
type Selector interface {
// data retrieval
InnerText() string
// structure query
SelectElement(query string) Selector
SelectElements(query string) []Selector
Query(query string) (Selector, error)
QueryAll(query string) ([]Selector, error)
QuerySelector(query *xpath.Expr) Selector
QuerySelectorAll(query *xpath.Expr) []Selector
// ===================== //
// currently not working //
// ===================== //
// signature differs between packages
// OutputXML() string
// vs
// OutputXML(self bool) string
// only in jsonquery
// ChildNodes() []*Node
// could be added to xmlquery easily
// only in xmlquery
// SelectAttr(name string) string
// no analog in jsonquery
}
As you can tell by the comments there were a few issues, the different signatures and differing functions could be overcome without too much issue, but the main issue;
I'd forgotten that even if *Node
satisfies the Selector
interface []*Node
will not match []Selector
due to the way go handles slices and not making O(n) calls implicit (see a stack overflow answer that also links to a few examples in the comments).
As a result, I'd need to change the return types of each function to return the interface, which is not ideal as it masks the methods not in the interface, as well as field properties of the struct.
At this point, proxying the interface with a wrapper type is probably the best choice for my application, as a naïve example:
func ConvertToSelector(doc *jsonquery.Node) common.Selector {
return selectorImpl{node: doc}
}
type selectorImpl struct {
node *jsonquery.Node
}
func (n selectorImpl) InnerText() string {
return n.node.InnerText()
}
func (n selectorImpl) SelectElements(query string) []common.Selector {
res := n.node.SelectElements(query)
result := make([]common.Selector, len(res))
for i := range res {
result[i] = selectorImpl{node: res[i]}
}
return result
}
// ...
The new method versions of the selector functions still help with this, but this does detract somewhat from the motivation
My current thought is to at least merge the forks and add the methods as they still have utility and can help other users, and improves test coverage.
But moving towards having a consistent interface as part of the package would be a larger longer term change, if still desirable.
Any thoughts?
otherwise I'll create the pull requests for the last pushed commits which only added the methods and additional test coverage and the discussion can move to specifically those changes in the pull request itself
Thans for your suggest, but This change will breaking the previous xmlquery and jsonquery package. xmlquery and jsonquery are independent packages, and both are depend on xpath package.