Automatic reruns?
AnthonyAstige opened this issue · 4 comments
Note: I'm relatively new to Mongo coming from a MySQL background, so please correct me if I'm mistaken anywhere.
The Mongo FAQ on ACID states
MongoDB does not support multi-document transactions
Which leads me to the conclusion that while a migration is running other documents may be created or existing document may be updated. This can be a problem for long running migrations.
So in extending this package for my own application I'm writing a method, for lack of a better word, called "migrationate". It basically takes search parameters and an updater function.
- The search parameters locate non-migrated documents
- The updater function should accept a document from the search, and migrates it in the collection (Note: It must make changes that are detectable from the search parameters)
The search is re-run and documents are sent to the updater function until no documents are found.
There are other details and complexities I'm working through but the above is the gist of it. I think it'll lead to good performance / not lock up mongo, and help ensure every document is migrated.
Questions
- Am I thinking correctly / in a mongo way?
- Does this sound like something helpful to other people?
- Presuming it's helpful, would it be better for me to integrate into this package and make a pull request, or make a separate package?
You're thinking along the right lines there with Mongo's lack of transactions. However, your concern is tied to how/when your run your migrations. In our case, we deploy our code first, then run the migrations. The code should be robust enough to handle the old/new schemas. Likewise, the migrations should be able to handle data that is already migrated.
This way, the migrations will convert any old data they 'see', whilst data written during the migration is running will already be using the latest schema. It seems unnecessary to introduce extra complexity by way of an 'updater' function.
Thanks for the explanation, I can understand your approach and see benefits to it even though I go a different direction.
While developing quickly in a beta application I prefer to ignore application code 'backwards schema compatibility'; I want to minimized the code I write that's only needed for as long as a single migration takes to run. And give myself more freedom to change the schema.
This procedure is closer to what I do to save developer time:
- Take website down
- Update application code
- Run data migrations
- Put website back up
Steps 2 & 3 are mostly in order, though I don't trust them to always be perfect. For example a client's code may not update right away (perhaps I need to trust meteor more). And though steps 1 & 4 mitigate my concern of users updating documents during migration, I often skip them because of a combination:
- Backwards incompatibility isn't that bad for this update
- 2 & 3 are quick enough relative to the number of users
In effect I often make the choice of prioritizing the combination of developer time & up-time over a few quirks being seen. I mitigate these issues with re-usable solutions like 'migrationate' when possible.
I don't think one solution is strictly superior to the other, but each have their use with different constraints. That said, presuming what I'm doing is helpful to others, I'm now thinking 'migrationate' belongs in it's own package to keep this core migration package simple.
Fair points. Agreed it probably belongs in its own package. Good work.
Cool thanks. Feel free to leave close this issue when you see fit. The main benefit now would be to see if there's community demand for this which would motivate me to package it up.