firebase/firebase-android-sdk

Firestore cache query slow when cache filled with deleted documents

Closed this issue ยท 8 comments

[REQUIRED] Step 2: Describe your environment

  • Android Studio version: Android Studio Narwhal Feature Drop | 2025.1.2
  • Firebase Component: Firestore (Database, Firestore, Storage, Functions, etc)
  • Component version: BOM 34.1.0

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

There seems to be a performance issue when reading from the cache when a collection doesn't currently have many documents, but historically has had many documents created and deleted. E.g. if a user had a "notes" collection and although they currently only have 5 notes if they have had thousand of notes in the past that have been deleted, then queries to the "notes" collection using the cache are much slower. I believe this is because the old documents still exist in the cache long after they have been deleted and until the cache size is reached the garbage collector will not remove them.

Although we could set a stricter cache size to try and kick the garbage collector in more often, I believe this is more of an underlying issue with the persistence in the Firestore SDK. E.g. if we reduced the cache size from the default 100MB to 1MB then it would reduce the chance of a cache "build up" of deleted documents but then restricts how big each document could be. E.g. a user might want some really long notes where each document is big say ~200KB so reducing the cache to 1MB means they can only have 5 notes offline.

I believe the garbage collector should be stricter about cleaning up old documents that are no longer relevant to the user's query or they should be excluded by the SQLite query so they do not need to be processed. These end up being filtered out before they are returned but they still use up a lot of resources whilst the query is executing. E.g. if there are 10,000 deleted documents and the query won't return any results then each document still gets processed via:

Completely saturating the background queue for a few hundred milliseconds, decoding documents that will never be returned (and slowing down other concurrent queries)

I've attached a sample project a sample project to demo the issue. It queries 2 collections and records the average time it takes. For the active collection it will create 10,000 documents and then delete them to prefill the cache with lots of documents that aren't returned but are still processed (you must set RUN_SETUP=true to create these documents). The active collection reliably takes > 10x longer despite both queries returning no documents.

Sample.zip

Hi @samruston. Thank you for the detailed report. I'm about to take a look at your sample project. I'm curious, have you noticed a similar performance degradation on Web or iOS? If you haven't or don't know that's fine, but it could help direct the investigation if you happen to have this information.

If you haven't or don't know that's fine,

@dconeybe Unfortunately I haven't tried to reproduce on web or iOS, sorry!

Note to self: Instead of deleting the document from the local cache, the SDK replaces the document in the local cache with a "tombstone" (a.k.a. a NO_DOCUMENT) to avoid inconsistencies in case we ever receive a version of the document from the server that is older than the deletion commit time (sometimes the server sends us documents out of order and the clients must account for that).

// We store the deleted document at the commit version of the delete. Any document version
// that the server sends us before the delete was applied is discarded
document.convertToNoDocument(mutationResult.getVersion()).setHasCommittedMutations();

Note to self: I've now confirmed that there is no mechanism to "clean up" stale NO_DOCUMENT entries from the remote document cache. Even garbage collection won't start evicting them until the max cache size is exceeded and they are considered "least recently used".

I'll see if I or someone on the team can come up with any ideas to improve the situation here.

Update: Good news! I have a potential fix for this performance issue: #7301.

In my testing using your sample app, the "inactive" query took 2 milliseconds, the "active" query before the fix took 67 milliseconds, and the "active" query with the fix took 6 milliseconds. The 6 milliseconds isn't quite as good as 2 milliseconds, but it's significantly better than 67 milliseconds and I'm happy with the results of this relatively-simple fix.

Note that when doing performance testing make sure to build your app in "release" mode with proguard enabled to get all optimizations. Performance numbers from "release" builds are the ones we typically aim to improve.

I'll discuss this fix with the team and let you know how that goes.

(FYI: My small modifications to the sample app provided by the OP are in https://github.com/dconeybe/AndroidIssue7295)

Update: The fix #7301 has been merged. It will be included in the next firebase-android-sdk release, which is planned for about 2 weeks from now (mid-late September 2025).

Note that this performance issue also affects the web sdk (firebase-js-sdk). It presumably also affects the firebase-ios-sdk although I have not explicitly confirmed this. Due to differences between sqlite and indexeddb and leveldb, the fix cannot be a direct port. I'll need to think about how to port the fix to those other two SDKs.

Update: The fix for this issue was just released on Sept 18, 2025 in the Android Cloud Firestore SDK version 26.0.1 (BoM version 34.3.0).

https://firebase.google.com/support/release-notes/android#2025-09-18