Incorrect number of documents returned from ordered and filtered collection query
sandreae opened this issue ยท 5 comments
Spotted this behaviour when working on the app and now was able to reproduce the issue in a test:
aquadoggo/aquadoggo/src/graphql/queries/collection.rs
Lines 1332 to 1398 in e19acd4
The returned number of documents does not match the total_count
. My anecdotal experience is that documents are missing which should be returned. If one adjusts the page size to include the entire collection then all documents are indeed returned, so it seems to be particular to the case where the collection is paginated over.
We have tests against this particular case (pagination + filter), must be something with exactly this query and shape of data ๐ค
Yeh, I believe when only querying one field of a document this behaviour doesn't occur. Also if there aren't more than 2 pages to be returned. I don't think it's a very niche error though, I noticed it in two different queries "in the wild" now.
I may have found the bug causing this issue. I have at least found a bug in the pagination logic, which causes duplicate documents to be returned from paginated collection requests. I think there may be cases where it also could cause fewer to be returned, haven't been able to observe that case in tests yet though.
If you look at these results from a pagination request for pages of 3 items (printed values are <DOCUMENT_ID> <CURSOR> <VALUE>
):
/// PAGE 1 RESPONSE
"00209b88a128c327bd1178b70d33b6a69189a489c0ff28796e8f0afb43720d07b0dc" "2wivUjpaNHZxwFjh9oic6je1vexfrNzGK3VLmBFBduDX6WEEvmeEisVJoTEzQu8PiM1h3DdhBcnu74xnaMVL5n5d" "Thrash, me crush me, beat me till I fall"
"00204888e94743374740730693f39c778b95d03ce4f340f77dbd374587170ba12536" "22ahi6PbPDws8pRHDjYgyWRVcLV1G68vyjBFMc4wXtFxyVzL6Pj24QvQs1PJgm5BaKNLHS3WqGYPBks2QcNvrVqG" "This heaven gives me migraine"
"0020e9a0884fe15cf137d0bbfb14d590c81ec9e23566b0f0c53da53b937c856213a3" "21DaUA29YhfQZQsm8SR2RTDbquqKoXcJhcXJDdJb3di1KgPJaZipo5sSrQedsEH7LZcQzre7LJaoBsRjdEntswnx" "This heaven gives me migraine"
END CURSOR: "21DaUA29YhfQZQsm8SR2RTDbquqKoXcJhcXJDdJb3di1KgPJaZipo5sSrQedsEH7LZcQzre7LJaoBsRjdEntswnx"
/// PAGE 2 RESPONSE
"00204888e94743374740730693f39c778b95d03ce4f340f77dbd374587170ba12536" "22ahi6PbPDws8pRHDjYgyWRVcLV1G68vyjBFMc4wXtFxyVzL6Pj24QvQs1PJgm5BaKNLHS3WqGYPBks2QcNvrVqG" "This heaven gives me migraine"
"0020b7951d9e3364494ae2f112c9fd411151ae31d29524f5ffcedbcabeb1ecdf81eb" "21D3aVb21xLajurqh5wHuax7AD6T6D5HaTZ4kQkwV3Wtuoywr5BE7aqDfXMbvYQr4vFPAN3XtNauP3efq2jReQYh" "The problem of leisure"
"0020ffea1841d8a75ec453d82c9e3f7347699f6494016f55b0dbe9897c8f12672598" "zGQ5rzG4LDfCtL4ptEMpPLJt9wY9MRmxqipoGFCaMqQ5uErqqJJo7AVTgFeFkJveQR7kXCKteR4srzxAyrPdszf" "The problem of leisure"
END CURSOR: "zGQ5rzG4LDfCtL4ptEMpPLJt9wY9MRmxqipoGFCaMqQ5uErqqJJo7AVTgFeFkJveQR7kXCKteR4srzxAyrPdszf"
You can see that the document with cursor 22ah..
is returned twice, once in the first response, and once in the second. This happens when its value is the same as the document with cursor 21Da..
(this is the end cursor) and its cursor is >
the end cursor, meaning it will be selected as the first document in the second query.
Looking at the code, we seem to be trying to account for this case here:
aquadoggo/aquadoggo/src/db/stores/query.rs
Lines 914 to 926 in 4809fab
But actually, the cursor ordering won't make any difference as the document view id is different so this ordering will take precedent. If we don't order by document view id then this issue goes away, but that's not the behaviour we want (it causes errors elsewhere).
Haven't come up with a solution yet, just documenting findings ๐
Haven't looked closely at this issue yet but my gut feeling tells me that there's something wrong with sorting. We kinda only want to sort by cursor, everything else (even document id and view id) should be optional and only done when selected by the user.
Yeh, that's the conclusion I came to as well. Think the only issue there is that we expect rows to be returned ordered by document id for when they are parsed into documents. So if we only order by cursor here, then we need to account for that when parsing.