Convert a large paper_trail dataset from YAML to JSON
During an upgrade of Rails from 6 to 7, I was tasked with converting a large amount of existing papertrail data from the default but not recommended format of YAML to JSON.
The process will likely differ depending on your existing setup, but I wanted to share a useful pattern for this. There are compromises yes, but this was fast and required a reasonable maintenance window of around 10 minutes per 1M rows in our versions table. Where papertrail stores it data by default.
Part of the requirement was to make this change in a single deploy. There was a good reason for this, but it did pose some challenges.
This is how it works.
Create a single migration to;
- change column names
object
andobject_changes
toold_object
andold_object_changes
- create new columns to replace them, of type JSONB
- kick off the data conversion
- change column names
Refactor existing logic that expects the old format
Upgrade Papertrail gem, and turn on JSON serializer, now all new records are saved in the new format
Let’s look at the code.
Create the migration
class CovertVersionsObjectData < ActiveRecord::Migration[6.1]
def up
rename_column :version, :object, :old_object
rename_column :version, :object_changes, :old_object_changes
add_column :version, :object, :jsonb
add_column :version, :object_changes, :jsonb
QueueConvertVersionObjectData.perform_later
end
def down
remove_column :versions, :object, :jsonb
remove_column :versions, :object_changes, :jsonb
rename_column :versions, :old_object, :object
rename_column :versions, :old_object_changes, :object_changes
add_column :versions, :new_format, :boolean, default: false
end
end
Background Jobs
Queue the jobss
class QueueConvertVersionObjectData < ApplicationJob
queue_as :default
def perform
PaperTrail::Version
.where(new_format: false)
.and(PaperTrail::Version.where.not(old_object: nil, old_object_changes: nil)
.ids
.in_groups_of(1000, false) do |version_ids|
ConvertVersionObjectData.perform_later(version_ids)
end
end
end
Convert the data
class QueueConvertVersionObjectData < ApplicationJob
queue_as :default
def perform(version_ids)
PaperTrail::Version.where(id: version_ids).find_each do |version|
versions << { object: YAML.unsafe_load(version.old_object || 'null'),
object_changes: YAML.unsafe_load(version.old_object_changes || 'null'),
new_format: true }
end
PaperTrail::Version.upsert_all(versions)
end
end
Yes, renaming the column is not an ideal solution. However, we could afford a decent sized maintenance window and needed an single deploy.
Using an upsert allowed a drastic increase in performance, making a single request for every 1000 records. Bear in mind if you are using Postgres as your database, you will need to pass along all columns with ‘not null’ constraints.
You can also write some tests to ensure running this will not change existing data, and that the converted data is the same. Hashdiff is great for this.
I hope you find this useful.
Dec 16, 2023