Importing Documents From Data Exports
adam-hurwitz opened this issue ยท 21 comments
Is there a way to import specific Firestore Database Documents under an exported Collection from Firebase Storage?
Observed
After creating a JavaScript AppEngine Cron Job according to the Schedule Export documentation there does not seem to be a way to import specific Documents, only Collections.
Expected
In-Short: The ability to export a specified Document representing a user.
The Firestore database of the app is currently structured with a users Collection at the highest level, and then a Document representing the user containing meta data, saved content, etc. A daily Cron Job has been scheduled with the documentation above in order to back up all of the Firestore data including the user data in the event users' data is inadvertently corrupted during development or maliciously in an attack.
However, from the Import specific collections documentation it appears only a Collection can be imported.
Existing Data Structure
- users collection
- user_one document
- actions collection
...
- categories collection
...
- user data fields - user_two document
...
- user_one document
Potential Solution if Importing Document Impossible
Refactoring the Firestore Database structure to add additional layers of Collections/Documents in order for each User to be represented by a unique Collection. The downside with the new data structure is you have a unnecessary additional users document 2nd level.
New Data Structure
- users collection
- users document
- user_one collection
- actions document
... - categories document
... - user data document w/ fields
...
- actions document
- user_two collection
...
- user_one collection
- users document
@AdamSHurwitz thanks for the feedback! Right now there is no way to import anything smaller than a collection when using the managed import/export operations.
Can you help understand your use case for these granular imports? Are you just doing testing and want to control your inputs? We have considered a simplified version of the import/export feature for small testing data sets.
@samtstern The use case is for production as a precaution in case a users' data is corrupted by a bug, by accident in development, or via an outside attack.
If for example a user has lost information in their account stored on Firestore Database I could simply import the last snapshot of their user account from Firebase Cloud Storage using the import feature.
Since there is no way to import anything smaller than a collection is the best approach to refactor my Firestore Database to include each user in it's own collection? That way I can import a user by collection.
For instance, with the structure below I could import all of the data for user_one by collection:
- all_users collection
- all_users document
- user_one collection
- actions document
... - categories document
... - user data document w/ fields
...
- actions document
- user_two collection
... - user_three collection
...
- user_one collection
- all_users document
@AdamSHurwitz got it, that makes total sense! I will file the feature request about this, and we'll put it on the list of improvements for import/export (we have a long list there...)
Tracking internally as bug 123524487
Follow-up
The proposed solution above of exporting all of the Firestore Database and importing a specific user collection does not work. The original Firestore Database has been refactored to store each user in it's own collection vs. in a document so that the collection can be imported later on. After re-examination of the documentation Export and Import Data it appears this is the reason.
You cannot import specific collections from an export of all documents.
The error received when attempting to import a specific user collection from the backup of the entire Firestore Database on Firebase Cloud Storage:
ERROR: (gcloud.beta.firestore.import) INVALID_ARGUMENT: The requested kinds/namespaces are not available
When importing all collections the operation succeeds as expected. This is useful if all of the app data is lost and needs to be recovered.
gcloud beta firestore import gs://coinverse-staging-backups/2019-02-01-16-37-02/
Collection Specific Import Attempts
Specifying just the collection id:
gcloud beta firestore import --collection-ids='Mv48HqezhaVhsO7UOW5lyWdslZw2' gs://coinverse-staging-backups/2019-02-01-16-37-02/
Specifying the collection id with the document it is stored under:
gcloud beta firestore import --collection-ids='users/Mv48HqezhaVhsO7UOW5lyWdslZw2' gs://coinverse-staging-backups/2019-02-01-16-37-02/
Specifying the full Firestore Database path of the collection:
gcloud beta firestore import --collection-ids='users/users/Mv48HqezhaVhsO7UOW5lyWdslZw2' gs://coinverse-staging-backups/2019-02-01-16-37-02/
Can Exports Be Done Programmatically With Node.js?
The Export and Import Data documentation show exporting via command line. Is there a way to do so using Node.js?
Rather than exporting everything with the broad Schedule Export Cron Job specific collections could be exported in a Cloud Function using How to Schedule (Cron) Jobs with Cloud Functions for Firebase.
This ensures a specific user collection can be retrieved via an import.
You cannot import specific collections from an export of all documents.
This is actually news to me and seems like something we should improve. I'll ask around
I discovered a temporary workaround this morning in order to restore the user data for a specific user even when importing the entire Db. This scenario tested will import the data for a user as long as that user is deleted prior to importing the entire Db:
- Save actions for User 1.
- Save actions for User 2.
- Take snapshot of entire db with method above for Export + Cron Job.
- Delete User 2 from Db.
- Save more actions for User 1.
- Import entire snapshot from Firebase Cloud Storage.
a. User 1 maintains the new data recorded in Step 5 which means the data was not overridden by importing the entire snapshot.
b. User 2 has the same data before Step 4 when it was deleted thus restoring the data which would be useful if data is somehow corrupted for this user.
Update
As I'm preparing for the first public release of my app I re-ran this experiment with the steps outlined above and at a closer examination found that User 1's Document data was reset to the imported snapshot. However, User 1 retained the Collections saved after the time of the snapshot.
To summarize, when importing an entire Firestore Database from Firebase Cloud Storage the Document data will be overwritten whereas new Collection data from after the time of the snapshot will persist.
@AdamSHurwitz just to clarify you're saying that if you have data like this
User doc:
/users/123
User subdoc:
/users/123/things/456
That importing users collection does not affect the user's things. That makes sense to me as the linking between collections and subcollections is not a "hard" link and all reads and writes are collection-scoped.
Is this the behavior you wanted / expected or were you hoping for something else?
Originally I had each user in a document /users/123 and /users/456 but refactored the user to be it's own collection /usersCollection/allUsersDoc/user123 and /usersCollection/allUsersDoc/user456 so that way a user's data can be imported from a Firebase Cloud Storage snapshot. According to the documentation it indicates only specific Collections can be imported from a snapshot, not Documents. (Export and Import Data)
Then, after realizing specific Collections can only be imported from snapshots of specific Collections, not of the entire Database I decided to attempt importing the entire database as a workaround in order to attempt to restore the data for only user123 testing the scenario their data is lost. (Schedule Export)
Outcome
When the entire Database snapshot is imported with the intent to restore only user123 it appears both the Document and Collection import as intended (expected).
However, it is the data for user456 which does not need to be restored, but is unintentionally affected by the Database wide import (not expected). For user456 Collection the contained Document data is overwritten to the time of the import. This is not-desired as only the data for user123 needs to be imported. However, the sub Collections of user456 remain intact (expected).
Conclusions
- Users must be stored under Collections in order for future ability to import specific users.
- There is currently not a simple approach of exporting specific Collections programmatically.
- As a workaround for #2 the entire database can be exported programmatically on a Cron Job schedule, and imported if needed to restore a user's data with the consequence of other user's Document data being reverted to the import as well.
@AdamSHurwitz some more questions:
2. There is currently not a simple approach of exporting specific Collections programmatically.
Isn't that what this flag does?
https://firebase.google.com/docs/firestore/manage-data/export-import#export_specific_collections
3. As a workaround for #2 the entire database can be exported programmatically on a Cron Job schedule, and imported if needed to restore a user's data with the consequence of other user's Document data being reverted to the import as well.
Why would you not expect importing the whole DB to overwrite all documents?
-
That flag meets the desired functionality. However, it is executed in a manual terminal command. I need to do implement that command programmatically (ie: within a Js Cron Job or Java/Kotlin Task) so that backups can be done on a schedule programmatically retrieving the user Collections and making the above request.
-
You are correct. I misread the line below interpreting it as only new document ids would be imported and existing ids would persist, but as it states, existing doc ids will be overwritten.
Imports do not assign new document IDs. Imports use the IDs captured at the time of the export. As a document is being imported, its ID is reserved to prevent ID collisions. If a document with the same ID already exists, the import overwrites the existing document.
@AdamSHurwitz for (2) this guide shows how to do things more problematically:
https://firebase.google.com/docs/firestore/solutions/schedule-export
You can see that the export can be triggered using the REST API, no CLI needed. Since it's a long-running operation you can just fire and forget. It takes collection IDs as a parameter, although I have no idea how many collections you could specify in a single request.
@samtstern, great idea! I will make the REST API request from a Cloud Function w/ a Cron Job. I'm happy to make a public friendly sample as well on Medium to explain to others as I think this would be valuable.
Confirming a few implementation details
-
If the request is made within the Firebase Cloud Function environment would the headers containing the
accessTokenstill be required or is it already authenticated?const auth = await google.auth.getClient({
scopes: ['https://www.googleapis.com/auth/datastore']
});const accessTokenResponse = await auth.getAccessToken();
const accessToken = accessTokenResponse.token; -
The
urlwill only need to pass the Firebase projectId.
https://firestore.googleapis.com/v1beta1/projects/[myProjectId]/databases/(default):exportDocuments -
The
bodyrequest takes 2 parametersoutputUriPrefixandcollectionIdswhich would look something like this.
outputUriPrefix:/cloud-firestore-export?outputUriPrefix=gs://[myBucketName]/2019-02-08-13-14-85
collectionIds:user123,user456
From there the POST request may be placed using Axios! I will play it safe and group requests by collections of 10 or 20.
- If the request is made within the Firebase Cloud Function environment would the headers containing the
accessTokenstill be required or is it already authenticated?
You'll have to add the token yourself, but it should be very easy to get within the CF3 environment because you're already running as a service account.
2 and 3
You want to POST to this url:
https://firestore.googleapis.com/v1beta1/projects/YOUR_PROJECT_ID/databases/(default):exportDocuments
And the body of the POST should be a JSON object with those two parameters:
{
collectionIds: ...
outputUriPrefix: ...
}
Honestly you can copy-paste 90% of the code from here:
https://github.com/firebase/snippets-node/blob/master/firestore/solution-scheduled-backups/app.js
Perfect! I've created a ticket for myself for post app release in order to implement.
@samtstern & @AdamSHurwitz thank you for your discussion above and apologies for hijacking this thread!
I have a 24-hour backup running thanks to the Scheduled Export documentation; however, I wondered if you could shed a little more light on how you ended up handling the granular export/imports of the individual users below the all_users collection?
I have two questions, one for each of you:
- @samtstern, in your last comment you state:
Also I've filed a FR internally to allow more granular imports from a
full-db export.
I was wondering if you meant that it is possible to import granular exports into a restored full-db export?
For example, with the final data structure above:
- all_users collection
- all_users document
- user_one collection
- actions document
... - categories document
... - user data document w/ fields
...
- actions document
- user_two collection
... - user_three collection
...
- user_one collection
- all_users document
If I triggered a single user export of the user_two collection, could I then import that into a restored full-db backup and only replace the user_two data?
- @AdamSHurwitz, you mentioned that you would be ...
happy to make a public friendly sample as well on Medium to explain to others
and I wondered if you had a repo available for that as I couldn't find anything in your article list, but I completely understand if you don't!
Once again, my thanks for your discussion above, it's helped me immensely!
@jonrandahl my understanding is that imports don't replace all of the information in the destination, they just restore all of the documents present in the import.
@samtstern thanks for the clarification on my question, I was able to do just that in my own testing today - I also realised I read your comment wrong and now take it to mean that we could, at some point, be able to extract user_two's data from the full-db backup, not a separate granular export - apologies for the confusion - your FR would be amazing!
@jonrandahl, I used the approach under the Schedule data exports documentation.
At the time of the original post, I refactored the individual userId's account info to be organized under their own collection instead of a document. Specific collections could only be imported if the specific collections were exported. Documents on the other hand did not have the same functionality for export/import.
Structure
Hierarchy
users_collection > users_document > userId_collection
User Account
userId_collection - contains all user data
- account_document - account info
- actions_document - user analytics
- collections_document - user's labeled content
I shared this feedback with @samtstern regarding document export/import functionality so hopefully this has been implemented or is on the roadmap.
