Support for nested query syntax within query string query DSL
tuespetre opened this issue ยท 38 comments
I understand that issue #9611 was closed regarding this:
Nested fields need to be queries with nested queries/filters, because multiple documents can match and you need to be able to specify how these multiple scores should be reduced to a single score.
โ -- @clintongormley
Proposal
I propose that, when a field name within a query string query is parsed, and it does not match a field mapping, an attempt should be made to match the field name to a nested object mapper. If the attempt is successful, the query text for that field name should then be parsed as a query string query using the same settings as the root level query string query. The resulting query from that parsing will in turn be used to create a ToParentBlockJoinQuery (a nested query) that uses the same default scoring mode that would be applied when manually submitting a nested query ("avg".)
The syntax
The acceptable syntax for a nested query within a query string query is similar to this:
nestedPath:"<query string query>"
This means that any constructs you would use in a query string query are valid:
children:"children.first:peggy"
children:"children.first:\"peggy\""
children:"children.first:(peggy ruby)"
children:"children.first:peggy AND children.last:sue"
children:"children.first:pegyg~ +children.last:su?"
Note that the nested query MUST be surrounded with quotes. I wanted it to be parentheses instead but unfortunately the Lucene QueryParser class does not recognize the field names the way I wanted it to (children:(children.first:peggy)
would come out as a TermQuery on children.first
, the children
field name would be discarded.)
Other considerations
- Support for specifying scoring modes within the query string query settings based on nested object paths is a possibility.
- Support for inner hits may also be a possibility, in a similar fashion to scoring modes.
Support for nested queries in query strings at all would be an enhancement, but these options could provide additional enhancements. Example of how they may look:
{
"query_string" : {
"query" : "children:\"children.first:peggy\"",
"nested": [
{
"path": "children",
"score_mode": "max",
"inner_hits": {
<inner_hits_options>
}
}
]
}
}
Pull request
For the basic functionality, I have already made the necessary modifications (three changed files, one changed test file to add a test with several assertions) on the 'master' branch of my local clone of the repository. I would like to submit a pull request; please advise as to how you would like that to be done (if I need to rebase onto another branch, etc.)
I've since worked around this in other ways (simple regex to parse out nested field expressions on my end and submit them properly to ES); It was fun to mess around with this but I fully support axing it now. It would just be more complexity to maintain; perhaps the query string documentation could hint at some kind of better solution for developers that may look for this functionality.
thanks @tuespetre
Hi @tuespetre ,
I'm very interested in the workaround you used. Did you manage to make it work with kibana ?
Thanks !
I wrote the following drop-in helper class (written in C#, but should be easily portable to other languages): https://gist.github.com/tuespetre/f6951bb665c79abbb7c8
You basically use the class to create new URIs by performing some function against the existing query string (remove this filter, replace that filter, add this filter, etc.) When you specifically need to allow users to perform a 'proper' nested query, you can just use the helper to extract the filters on the nested properties out and build up a separate query string, which you would then submit as a nested query string query in your request to Elasticsearch.
I'm using it to offer both 'customer service representative friendly' interfaces (where the query string built up by the 'friendly' controls is stored in a hidden input) and 'technical user friendly' interfaces (where the query string is spit out into a visible text box that you can also type in, a-la GitHub Issues.)
I actually quite like this proposal. Is it something that would be considered by the elasticsearch team or is this something that's not likely to ever be a feature? I'd love if the query string syntax allowed for nested query combinations.
I wanted it to be parentheses instead
Agreed. I think this syntax would be much better served by parentheses instead of quotations.
Hi,
Is the syntax recommended here for the query_string supported in ES, I am using Version 2.2 and am having hard time getting it to work
Hello I also think this should be supported. query_string remains a nice helper, and being able to use nested objects whit it would be great.
@alexgarel and all everyone:
I think it would be more beneficial to keep something this niche and complex out of the core elasticsearch, and offer your own query DSL 'layer' that can be translated into a 'proper' ES query on the backend. By brushing up on regular expressions (or even parsing!) a little bit you can put together some pretty cool UX affordances specific to your application.
...keep something this niche and complex out of the core elasticsearch...
...By brushing up on regular expressions (or even parsing!) a little bit...
So, is it niche and complex or is it as simple as adding a few items to the elasticsearch grammar?
Personally, I agree that you can add a custom syntax on top (with regexes or otherwise), but I also would like this discussion to remain open, because I think having a conversation about making the query string syntax more robust isn't necessarily a bad thing. Having every elasticsearch application implement yet another hack on top of the query string syntax to accomplish this isn't necessarily a great use of global man-hours.
I'm mostly interested to better understand if the elasticsearch team is interested in a Pull Request for this feature. So far, we don't have an answer to that question.
@tuespetre
Ok I understand, it's the way we have chosen but not fully implemented yet. If someone needs it, we have a (GPL) lucene query parser in python
@rmm5t I had submitted a PR (#11339) but as @clintongormley points out it's just a fragile thing to have in the core application, and as I found out when working the PR initially, it can't really be done with a pleasant syntax -- it comes out feeling very verbose and awkward, especially being unable to hijack the parenthesis for it. With a small handful of regular expressions I was able to implement a much nicer syntax specific to the particular needs of our application without feeling like I had to 'settle' for something subpar.
I had submitted a PR (#11339) but as @clintongormley points out it's just a fragile thing to have in the core application, and as I found out when working the PR initially, it can't really be done with a pleasant syntax
@tuespetre Interesting point. That PR was tagged for discussion (which, respectfully, never really happened amongst the elasticsearch team, aside from @clintongormley willingness to comment and chime in). Then, it was closed, solely because you closed this particular issue after building a workaround -- not because a discussion really happened.
I agree with your first assessment that the double-quoted syntax isn't ideal. I understand there are problems with the clearer parentheses syntax, but I suspect those can probably be overcome.
If the core query string syntax and implementation are "fragile," maybe that's something that should be addressed and potentially refactored as well. To be clear, I'm not trying to make light of this; I'm sure a refactor would be a tricky endeavor.
Proposal
Overall, I'd really just like to see an ability to narrow a query string search to one particular embedded object. I'd like to see a syntax that looked like this:
children:(gender:male AND age:>=18 AND age:<=25)
Otherwise, there's no way to use the query string syntax and (in this particular US-centric example) find parents who have children who should be signed up for the US Selective Service System.
can we resurrect this issue please?
Yeah, I think we need to think more about whether to expose this. Opening for more discussion
+1
If I could comment on my experience as a user:
It took me an hour or so to figure out that this didn't exist. I'd like to build a dashboard with a search bar, where the syntax is defined by Elasticsearch/Lucene's query string syntax. Having this would make that project substantially easier.
As an engineer: this seems like a great candidate for something that could grow, mature, and harden outside of the core. If a service/library can be built using Lucene's parser and submit JSON-style nested Elasticsearch queries on the backend, we could figure out the details with a non-core prototype.
children:(gender:male AND age:>=18 AND age:<=25)
I like that.
My initial idea, inspired by jq
: children[].gender:male
. About 2 seconds of thought went into that, so potentially full of holes :)
@buchanae sorry for I repeat myself but you can see our (GPL) lucene query parser in python it's yet far from perfect but may help.
I'd also like to +1 this and share my experience. I'm a long-time Elasticsearch user and I've recently hit the "field mapping explosion" limitation. Our system allows users to define their own objects with any number of custom fields, which leads to a mapping explosion. Currently, from what I read in the forum, the only way to solve this is to use nested key/value objects inside an array field:
nested: [{k: FIELD1, v: TERM1}, ...]
This lead me to this issue. I'm trying to seamlessly combine normal queries and queries to nested objects in a single query string query. I think this feature would make it easier for people to solve the problem of "too many custom fields".
EDIT: I've implemented this as a Lucene query string syntax extension, by detecting and rewriting queries which contain special nested fields. Link to code
/cc @elastic/es-search-aggs
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1 would like this added
+1
I have the issue, where I'm trying to do a nested query from logstash using the elasticsearch filter, which only supports query string, not the regular DSL.
I can accomplish this in KQL like this:
myNestedObject:{ nestedProperty: "The value I'm looking for" }
+1 it will be very useful
+1
+1