Clarify the reasoning for Define Model Class Migrations
pirj opened this issue · 8 comments
or cause serious data corruption
Can you think of a use case when an old migration could cause a serious data corruption?
Yes, it will most probably break db:migrate
on a clean database.
- There's yet another reason not to use real models in migrations.
If you refer to a real model in an older migration (e.g. forupdate_all
), Rails will cache column information.
And subsequent migrations that would alter that table's schema won't affect the column cache.
This will lead to breakages due to an outdated (fixed for the moment when the model was first referred to in migrations) table schema. Specifically, if you add a column in a later migration andupdate_all
that new column, this will result in an error.
Also, if you runrake db:migrate db:seed
in one command, this will also affect your seeds, and it will also fail with:
ActiveModel::UnknownAttributeError: unknown attribute 'some_attr' for ModelName
There are two more things that recently hit me:
- We suggest defining the class on the top level:
class MigrationProduct < ActiveRecord::Base
self.table_name = :products
end
class ModifyDefaultStatusForProducts < ActiveRecord::Migration
while the MigrationProduct
can be defined by multiple migrations.
If it happens that self.table_name =
is different, the last one wins.
I don't have a good example, maybe something like:
class MigrationUser < ActiveRecord::Base
self.table_name = 'accounts' # not 'users'
end
The suggestion would be to define migration model classes inside migration classes:
class ModifyDefaultStatusForProducts < ActiveRecord::Migration
class MigrationProduct < ActiveRecord::Base
self.table_name = :products
end
...
- Migration models are loaded as needed in development, while loaded all at once in production with
eager_load
. (this needs to be confirmed with the Rails code, if the migratorload
s them or it loads the wholedb/migrate
and instantiates migration classes as needed.
Still, the issue was as following:
# migration 1
class AddCategoryToFoos < ActiveRecord::Migration[6.0]
def change
add_column :foos, :category, :string
end
end
# migration 2
class BackfillCategoryToFoos < ActiveRecord::Migration[6.0]
class MigrationFoo < ActiveRecord::Base
self.table_name = 'foos'
end
def change
MigrationFoo.find_each { |foo| foo.update!(category: 'primary') } # <= The real problem is here
end
end
# migration 3
class RestrainCategoryToFoosToNotNull < ActiveRecord::Migration[6.0]
def change
change_column_null :foos, :category, false # BOOM, `PG::NotNullViolation: ERROR: column "category" contains null values`
end
end
The problem is with foo.update!(category: 'primary')
.
ActiveRecord::Dirty has no awareness of the recently added category
column, and save!
decided not to do an UPDATE
.
I don't have a good explanation as to why this happened. My theory is that if those two last migrations were executed in the same run, Rails loaded both, and immediately cached model's schema for MigrationFoo
.
This was also happening on local.
Might be related to:
config.active_record.use_schema_cache_dump enables users to get schema cache information from db/schema_cache.yml (generated by rails db:schema:cache:dump), instead of having to send a query to the database to get this information. Defaults to true.
But I couldn't find db/schema_cache.yml
locally.
One suggestion would be to dynamically create model classes instead:
def change
MigrationFoo = Class.new(ActiveRecord::Base) { self.table_name = 'foos' }
MigrationFoo.find_each { |foo| foo.update!(category: 'primary') }
end
but I can't recommend it without testing it. And I have some doubts if schema cache will be updated for a new model pointing to the same table, or an existing cached definition (that e.g. lacks category
column info) will be used.
Another approach is to use:
def change
MigrationFoo.reset_column_information
MigrationFoo.find_each { |foo| foo.update!(category: 'primary') }
end
See https://stackoverflow.com/a/9728033/202914 for explanation.
Otherwise, an error is raised:
unknown attribute 'category' for MyCoolMigration::MigrationFoo.
Apparently, migrations are loaded all at once, or somehow column information is shared between MyCoolMigration::MigrationFoo
and MyPreviousMigration::MigrationFoo
.
I'm on the fence if it's worth adding to the guide, as it's for this relatively rare case when a project tries to keep migrations from early days in a state that allows running rails db:create db:migrate
rather than rails db:create db:schema:load
.
reset_column_information
is what Rails suggests using:
Sometimes you'll want to add a column in a migration and populate it immediately after. In that case, you'll need to make a call to
reset_column_information
in order to ensure that the model has the latest column data from after the new column was added.
One case when it's not possible to properly use migration model in migration when you use it to schedule an Active Job job:
class MigrationUser < ActiveRecord::Base
self.table_name = :users
end
def up
MigrationUser.find_each { |user| SyncWithThirdParty.perform_later(user) }
end
as this will end up with a failure to deserialize it, as the global id would use a weird model name.
As a workaround, global id key should be overridden in the migration model.
An example of data corruption (logical):
- You write a migration that adds a new nullable column, and you write code to make sure that certain types of records get backfilled with some type of value. For this, you use
ModelName.where(...).update_all(...)
- If you have a gem such as act_as_paranoid or any code that added a default scope to your model, migrating data this way will prevent records not matching your default scope from being updated.
In the case of act_as_paranoid
, that could not be a big issue since it will likely exclude soft-deleted records, which you may not care that much (I do, though). If you try, however, to update your newly added column to be non-nullable, that could break your production deployment because of the presence of soft-deleted records (which would still be null as they were excluded by the default scope), while it's very likely you won't notice this in development (unless you're constantly soft-deleting records locally, which most developers won't do).
One could argue this is a problem with the usage of default scopes, and I'd also agree. Doesn't change the fact that avoiding direct references to your models in migrations prevents this kind of error.
Thought it was worth sharing.
I agree with not using regular models in migrations - that's just asking for trouble, but I strongly disagree with defining model classes inside of migration - that is too cumbersome and goes against the first principle of the rails doctrine (Optimize for programmer happiness), since it's a fact that it causes headaches for engineers.
I propose just using raw SQL to change data in a migration like this. I know some people will disagree (maybe even barf a little) when reading this. However, this produces simpler and cleaner code, is much easier for developers to use, but is controversial and not widespread. It's very common in Basecamp (https://discuss.rubyonrails.org/t/patterns-for-data-only-rails-migrations/75320/10) and plays nicely with what the migrations were designed to be. I'll quote the linked reply from DHH here for ease of readability
We mix data and sql statements in migrations all the time at Basecamp. This is what migrations were designed to do! They’re intended to be moments in time. Not to be replayable forever. The stable unit is schema.rb (or structure.sql).