A change in Rails 7.1 caused an old migration of mine to raise an error. This isn’t the first time it has happened, and I’m sure it won’t be the last. As someone who has put in work to ensure that old Rails migrations continue to keep working eternally, I can tell you it isn’t an easy task. But this error was different than others I had seen: the migration wasn’t making any changes to the schema; it was migrating data.
Moving data south for the winter?
Active Record Migrations are the set of tools Rails provides to make changes to
an application’s database. A migration is written in a Ruby file using a Domain
Specific Language, and then run using the rails
Command Line Interface.
Generally that means running rails db:migrate
, which will execute all
migrations that have yet to run against the database.
After a migration has run, Rails also dumps the current state of the database in
a file (usually db/schema.rb
) which uses the same migration DSL. Developers
can then setup new databases with the rails db:schema:load
command to load
just that file instead of having to run every single migration from the
beginning.
Got it, changing the database
When Active Record migrations are used to make changes to the database schema, everything works great. Since the DSL is a relatively thin wrapper over SQL statements, migrations don’t have to worry about any application code. However, things start to get more complicated when developers try to use migrations to modify the actual data in the database, which is also known as a data migration.
Schema migrations and data migrations feel like they should go together right? For example, say a new column is being added to a table and existing rows in that table will need that column populated. Since the column is being added in a migration, why not put the code to populate the column right next to it?
This is exactly the pattern I’m here to advocate against.
Migrations should only be used to make schema changes.
The Error
In my case, the error I ran into was this:
Undeclared attribute type for enum 'blah'. Enums must be backed by a database
column or declared with an explicit type via `attribute`.
Rails 7.1 added a new check to ensure that an Active Record enum
is either
backed by a column in the database or has an explicitly specified type.
The goal of these changes was to make enum
resistant to typos, since
previously misspelling the name of the enum
would just silently not work as
expected.
Of course, this new check can also cause problems when a model with an enum
attribute is used in a migration. Let’s look at the following series of
migrations:
class AddPosts < ActiveRecord::Migration[7.1]
def change
create_table :posts do |t|
end
end
end
class BackfillPosts < ActiveRecord::Migration[7.1]
def change
Post.find_each do |post|
# modify the post
end
end
end
class AddStatusToPosts < ActiveRecord::Migration[7.1]
def change
add_column :posts, :status, :string, default: "draft"
end
end
In the first migration, a table is created for a Post model. In the second, a
data migration is performed on the rows of the table. And in the third
migration, a new status
column is added to the table. After running the third
migration in production, a developer adds enum :status
to the model to take
advantage of the new column:
class Post < ApplicationRecord
enum :status, { draft: "draft", published: "published" }, default: :draft
end
After adding the enum
, the second migration now errors because the column that
should be backing the enum
isn’t added until the next migration!
A pattern emerges
Looking at this problem in isolation, it certainly could be viewed as a bug in Rails, or at least something to consider supporting. However, this is just one specific instance of a much larger problem. Using the same migrations from before, but with a slightly different model:
class Post < ApplicationRecord
validates :status, presence: true
end
A different error is raised when running the second migration:
NoMethodError: undefined method `status' for an instance of Post
How about another series of migrations but with an empty Post class:
class AddPosts < ActiveRecord::Migration[7.1]
def change
create_table :posts do |t|
end
up_only do
10.times { Post.create! }
end
end
end
class AddStatusToPosts < ActiveRecord::Migration[7.1]
def change
add_column :posts, :status, :string
Post.find_each do
post.update!(status: "published")
end
end
end
Can you spot the error?
I’ll give you a hint: it involves the schema cache.
After running these migrations, it would be reasonable to expect that there are
now 10 Posts and all of them have a status
of "published"
. However, that is
not what actually happens.
When the first migration adds 10 Posts to the database, Active Record will
execute a query to fetch the list of columns for the Post model and store that
list in the schema cache. Keeping the list cached in memory ensures that it
doesn’t have to make the same query again in the future. The list of columns is
then used to generate the queries to insert 10 Posts into the posts
table.
When the second migration runs, the status
column is added to the database,
but the schema cache is not cleared. So when the migration next tries to update
the status
column for existing Posts, Active Record doesn’t recognize status
as a database column and the existing Posts are not updated.
Fundamental Incompatibilities
In all three of these cases, there are two conflicting concepts that cause problems when used together:
On one side are migrations, which are meant to operate on the database at a certain point in time.
On the other side are the Active Record models, which are meant to operate only on the current state of the database.
These two ideas are fundamentally at odds, and no amount of tricks or workarounds can really address that. Naturally, there are likely many exceptional cases where people have and will continue to successfully make these two things work together. But instead of hoping and hacking, we can do something quite simple to avoid the problem altogether: do not make data changes in migrations.
What should I be doing?
With a disclaimer that I work at Shopify, my favorite library to perform data
migrations is maintenance_tasks
. A Maintenance Task is really just an
Active Job with some additional features, like the ability to interrupt/resume
long running jobs and a simple UI to kick off jobs and track progress.
While I think the additional features are great, a separate library is not strictly necessary to separate data migrations from Active Record migrations. Creating a regular old Active Job and a way to enqueue it is probably plenty for the majority of cases.
Flying back north…
When an application is small, it can feel really tempting to reach for Active Record migrations to make sweeping data changes. It’s quick, dirty, and gets the job done at the time. But as an application evolves with age, this behavior is always a mistake and will come back to bite. It’s time to stop changing data in Active Record Migrations.