Count of results returned by query() is not equal to the number of queries passed
Closed this issue · 11 comments
We can pass multiple queries to InfluxDB::Client#query
by concatenating them, separated by a ;
.
If we do that, we expect the number of results returned to be the same as the number of queries passed. This is currently not the case.
This also affects the result of a single query that uses a GROUP BY
clause on a tag. In that case the number of results returned is equal to the number of distinct tags returned by the GROUP BY
operation.
This breaks all the expectations and prevents useful batching of multiple queries especially when intermediate queries may return no data.
Thanks for reporting and looking into this. I will have a look as well when I'm back home (in about 3-4 hours).
I'm currently integrating your PR into the pr/218
branch. I have moved the batch related methods and specs into seperate files and fixed the rubocop issues.
Before this lands in master
, I'll also add a smoke test to actually perform multiple queries against a running InfluxDB server (this might take a moment; please refrain from forking off the pr/218
branch, as I will force-push while fighting with Travis ;-))
After using it a bit, I find the distinction between #query
and #batched_query
not quite usable... I'm in favour of giving #query
the ability to do the right thing™, when we pass an array of queries.
As you've also discovered, this causes an ambiguity with GROUP BY
clauses. But that could be mitigated, if we add the statement_id
response parameter into the return value, like so:
client.query(["select * from foo", "select * from foo group by bar"])
#=> [ {"statement_id" => 0, "name" => "foo", "tags" => nil, "values" => [...] },
# {"statement_id" => 1, "name" => "foo", "tags" => { "bar" => "a" }, "values" => [...] },
# {"statement_id" => 1, "name" => "foo", "tags" => { "bar" => "b" }, "values" => [...] } ]
Would that be a workable solution for you?
I agree that #batched_query
may not be the best way. We went with creating #batched_query
so that we didn't break the existing behaviour for #query
.
How about changing #query
to always return an array of arrays(series
) when an array of queries is supplied. Then it's trivial to check if the inner array is .empty?
for when no data is returned. GROUP BY
s are also nicely contained in the inner array since it isn't flattened. It also removes the need for #query
to return the statement_id
which until now wasn't exposed to the caller.
How do you propose to support query parameter substitution when multiple queries are passed? #batched_query
does not take params:
in the above implementation.
I like the "Array-in Array-out" approach, it extends the existing API nicely.
For parameter substitution I'd say, you'll need to supply an array of parameters (where "parameters" designates an array of positional arguments or a hashes with named arguments). Let the behaviour of passing a simple array/hash be undefined for now. The naïve implementation then could be:
- query = builder.build(query, params)
+ query = [*query].map_with_index {|q, i|
+ builder.build q.chomp(";"), params[i]
+ }
where builder.build
raises an execption when params[i]
does not support #[]
for indexing.
That being said (and having slept over this issue), I wonder if extending #query
is the right approach, as it encapsulates too many responsiblities in one method (this isn't Rails after all ;-)). Batching should be seperate from (simple) querying.
Therefore, I suggest the following API:
batch = client.batch do |b|
b.add(query1, params: params1) #=> 0 (statement id)
b.add(query2) #=> 1
end
# adding queries outside block
batch.add(query3) #=> 2
# fetch series and return results
batch.execute #=> [results1, results2, results3]
# or with block
batch.execute do |statement_id, name, tags, values|
end
where batch.add
inherits the semantics from client.query
, with one notable exception: it doesn't perform the HTTP request, and returns a statement ID. Calling client.fetch_series
and processing the result then happens later in batch.execute
(if batch.execute
is stateless, it should be possible to call it multiple times).
This still has some kinks (:denormalize
and :chunk_size
should not be acceptible by batch.add
, but either by client.batch
or batch.execute
), but I believe it's doable.
it encapsulates too many responsiblities in one method
Exactly, even rubocop was complaining that the file had too many lines 😀.
The batch
object which separates construction and execution looks like a good way to have #query
like semantics while still accepting multiple queries and returning multiple results.
Do we have access to the statement_id
before we send it to Influx? Or are we just assuming that the order in which a query was added will be the order in which Influx will return statement_id
s?
We can safely predict the statement ID:
StatementID is just the statement's position in the query
(see query.Result type, the ID corresponds to the control variable i
in a simple loop).
Great, looks like we have a decent contract now.
I've merged a preliminary version into the master branch. Feel free to play around with this for a few days. Documentation is still missing though (but will follow, before I push 0.6.0 to rubygems).
One slight problem: Batching now requires InfluxDB >= 1.2.0, as the statement ID was not available before that version.
(Note to self: add this fact to the readme)