Here at Rails Machine we love MariaDB Galera Cluster. It solves a lot of problems we have experienced in the past with other MySQL multi-master solutions. However, schema changes require special care with Galera Cluster. Galera Cluster provides two options for processing DDL (Data Definition Language) statements: Total Order Isolation (TOI) and Rolling Schema Upgrade (RSU). Each method has its advantages and disadvantages and understanding when to use each is important to ensuring that schema changes go smoothly in a Galera Cluster. We will go through both methods, when to use them, and a solution we developed to help manage the complexity.
Total Order Isolation
TOI is the default DDL replication method in Galera Cluster. When the master node receives a DDL statement it sends out a replication event before starting the DDL processing. Every node in the cluster will processs the replicated DDL statement during the same “slot” in the cluster transaction stream. This ensures that every node in the cluster will process the schema change at the same time.
With this guarantee you don’t have to worry about schema backwards compatibility, but there are some drawbacks. The strict commit order will make every transaction wait until DDL processing is over. Meaning that altering a table will block any queries that are trying to access that table. For a table with a large number or rows, altering a table or adding an index could take several minutes, or longer, during which the table cannot be queried.
Rolling Schema Upgrade
To allow the rest of the cluster to continue operating at full speed Galera Cluster offers the RSU method for DDL statements. During RSU the node executing a DDL statement is desynchronized from replication for the duration of the DDL processing. All incoming replication events are buffered and the node will not send replication events to the other nodes in the cluster. When DDL processing is over, the node will automatically join back into the cluster and process missed transactions from the buffer. Once the node has rejoined and caught up with the rest of the cluster, you must repeat the DDL statements on the next node in the cluster.
The RSU method will not slow down the cluster; all other transactions can complete at full speed on the two synced nodes. However, there are caveats that must be considered when using RSU. The entire session will be processed with RSU (i.e. any insert statements will not be replicated to the other nodes). Second, upgrading the schema on all nodes is a manual operation. As a result the schema changes must be backward compatible since queries will be processed against and replicated to upgraded and non-upgraded nodes.
How does this affect my Rails migrations?
The default DDL method for Galera Cluster is TOI. This means everytime you run
cap production deploy:migrate (or some equivalent method of running database
migrations in production, such as
Moonshine!) the DDL statements are
being executed simultaneously on every node in the cluster. Now, if your tables
are sufficiently small, the load is low at the time, or you can afford to put
your app in maintenance mode while the migrations are running, then you may
never need to consider RSU. But, if your migrations take several minutes
(or hours) to run and you can’t afford the downtime, then you need to consider
using RSU for your migrations.
Unfortunately, there weren’t any good solutions for using RSU with Rails migrations so we wrote GaleraClusterMigrations. GaleraClusterMigrations helps to take the pain out of using RSU with Rails migrations on a MariaDB Galera Cluster.
GaleraClusterMigrations contains a module that can be included in your migration with methods to enable TOI and RSU. It also contains a Rake task that will run the migration against a single cluster node, and a Capistrano task for remote executing the migration on each node.
The first step is to include the
GaleraClusterMigrations module in your
migration and use the
#with_rsu method to enable RSU for DDL statements
specified inside a block:
# db/migrate/20140710000000_add_foo_to_bars.rb class AddFooToBars < ActiveRecord::Migration include GaleraClusterMigrations def change with_rsu do add_column :bars, :foo, :integer, default: 0 end end end
add_column statement above will be executed in RSU mode when the migration
If you are using Capistrano, require
config/deploy.rb file and use the
galera:migrate task to run the
migration on each node:
# config/deploy.rb ... require 'galera_cluster_migrations/capistrano' ...
You should also disable the default
deploy:migrate task from running during
a deploy. Moonshine users should create a “dummy”
rails_migrations recipe to
prevent Moonshine from migrating the database during a deploy:
# app/manifests/base_manifest.rb ... def rails_migrations exec 'rake db:migrate', :command => 'true' end ...
Once the migration has been deployed to your cluster, execute the
migration on each node checking that the entire cluster is in the Synced
state before proceeding with the next node. Be sure to use the
option to specify a single node to migrate at a time.
$ cap production galera:migrate HOSTFILTER=db1.example.com # check that db1 has rejoined the cluster and is Synced $ cap production galera:migrate HOSTFILTER=db2.example.com # check that db2 has rejoined the cluster and is Synced $ cap production galera:migrate HOSTFILTER=db3.example.com # check that db3 has rejoined the cluster and is Synced
You have now successfully upgraded the schema on every database node using RSU! During this time the two nodes not processing the migration will be available to process queries at full speed.
For migrations that you want to run using TOI there is nothing special required
to migrate the database. Write the database migration as you normally would and
migrate the database with
cap production deploy:migrate.
Understanding the nuances of TOI and RSU can be challenging. Fortunately, we have compiled a set of recommendations to help you decide which method is appropriate to use:
- Use TOI for
- Use RSU for
- Set default values for new columns when using RSU.
- When using RSU, ensure that your application can use the upgraded and non-upgraded tables until every node has been upgraded.
- Use the GaleraClusterMigrations gem to ease the use of RSU with Rails migrations.
The following resources provide excellent details on TOI and RSU if you want to understand the nuances of each more:
- Webinar Replay & Slides: Galera Cluster Best Practices - Zero Downtime Schema Changes
- Online Schema Upgrade in MySQL Galera Cluster using TOI Method
- Online Schema Upgrade in MySQL Galera Cluster using RSU Method
- Galera Cluster Best Practices
- Schema Upgrades - Galera Cluster for MySQL Documentation
- Replication with Differing Table Definitions on Master and Slave