MariaDB Galera Cluster - Monitoring and Dealing With Failures



Written By : Kevin Lawver


September 15, 2014

We’ve talked a couple times about MariaDB Galera Cluster, and over a year later, we’re still in love with it. We’ve learned a few things about managing clusters, monitoring them, saving them when they need it, and doing upgrades.

Monitoring

If you remember out first post about MariaDB, we use haproxy to load balance the databases and make sure connections and queries only go to “up” databases, so we do a lot of our monitoring at the load balancer level, which gives us a better view of the entire system’s health instead of the health of just one node in the cluster.

Here’s a list of the things we monitor in Scout and why:

  • Active Servers in the MariaDB Read Backend - If this is lower than 3 for more than 30 minutes that means a node is down and is either taking too long to resync, or didn’t restart correctly and needs some “help”.
  • Local State - This is reported by the Galera Cluster Monitor plugin and should be 4 (meaning it’s synced). If it’s less than 4 for more than 30 minutes, it means it’s not coming back up and needs to be kicked.
  • Cluster Size - Another Galera Cluster Monitor plugin item. If this is less than 3 for more than 30 minutes, it could mean an instance has gone rogue and formed its own cluster (which usually only happens because you did something wrong in your config files somewhere).
  • Flow control paused - Another ones from the Galera Cluster Monitor plugin. If it’s 1.0 for more than 30 minutes, it means something bad has happened. Either syncing has stopped or the process is “stuck” for some reason. Also means a kicking is in order.

All of the things we use the Galera Cluster Monitor Scout plugin for, you can get at using this query: show status like 'wsrep%';, which returns a bunch of Galera-related information, most of which is useful. There’s a lovely page with all of them on the MariaDB site.

When Things Go Wrong

Bad things happen to good apps. The good thing is that when things go bad with a single node in a galera cluster, it’s not that bad. Here’s what happens when a node goes away:

  1. The node drops out of the load balancer because it’s either down or not synced.
  2. That causes all of the clients connected to it to disconnect. For Rails apps, that means a bunch of error messages, and then things go back to normal. We haven’t found a good way around this one yet.
  3. When that node comes back up, it reconnects to the cluster, which causes one of the remaining nodes to drop out and become the “Donor” and syncs its data with the newcomer. If you have a three node cluster, that means there’s one node now taking all reads and writes. For short bursts, this usually isn’t horrible.
  4. Once the new node is ready, it and its donor catch up with the primary by replaying the transactions in their receive queue.
  5. Now all three nodes should be green again in haproxy!

How long that process takes depends on how much data there is to sync. I’ve seen it happen in a minute on a local vagrant stage, and take a long as thirty minutes in production. Other than that initial blip as clients reconnect, it doesn’t affect availability at all. Compared to other methods (that I’m too polite to mention), that’s awesome.

When Things Go Really Wrong

If you have to rebuild your cluster from scratch because of something horrible, the process is more hands on and requires some actual downtime, which is never fun.

Galera Cluster requires an initial node to bootstrap a cluster. If something disastrous happened, you should choose the node that had the least bad stuff happen to it since it’s going to be the node that all the others sync from (this is a sad sad story I’m imagining).

I’ll walk you through how we do this with the capistrano tasks built in to the moonshine_mariadb Moonshine plugin, which does everything a lot faster than doing it by hand:

  • mariadb:setup_master:
    • Adds /etc/mysql/conf.d/master_setup.cnf which has set wsrep_cluster_address = gcomm://, which is what that needs to be to bootstrap a cluster.
    • Restarts mysql on the boostrap node.
  • mariadb:setup_slaves:
    • Adds /etc/mysql/conf.d/slave_setup.cnf, which sets wsrep_cluster_address = gcomm://IPADDRESS where IPADDRESS is the address of the bootstrap node.
    • Restarts mysql on the secondary nodes.
  • mariadb:finalize_cluster HOSTFILTER=db1:
    • This is HOSTFILTER’ed to the bootstrap node so you only restart mysql on a single node at a time.
    • Removes /etc/mysql/conf.d/master_setup.cnf
    • Restarts mysql
    • It should join right back up since no data should have changed.
  • mariadb:finalize_cluster HOSTFILTER=db2 (and then db3)
    • Removes /etc/mysql/conf.d/slave_setup.cnf
    • No need to restart mysql here since it’s already synced. You just want to make sure slave_setup.cnf is removed so it syncs from any available node if it restarts for some reason.
  • mariadb:status
    • This should show wsrep_local_state_comment as Synced.
    • And should also show wsrep_incoming_addresses with all of your nodes in it, assuming you’ve done the preceding tasks correctly.

If you have a node that just won’t rejoin, it could mean its data is in a bad enough state that it can’t figure out where to catch up from. In that case, stop mysql on it (which sometimes requires some kill -9 action), and then delete /var/lib/mysql/grastate.dat and /var/lib/mysql/galera.cache and restart mysql. It will join the cluster as a new node and sync all the data. It takes longer, but it’s safe and works.

Upgrades

Upgrades are easy! Because things will failover, you just need to make sure you upgrade your cluster one node at a time, and wait until the upgraded node is synced and green in haproxy before you start on the next node.

There are some issues with upgrading packages in a cluster as there are sometimes backwards-compatibility issues with galera versions. It’s usually best to put your site in maintenance mode, wait until everything’s synced, and then bootstrap the cluster after the upgrade.

In Conclusion

Like we said above, we love MariaDB Galera Cluster. It’s saved many hours of sleep over the past year by doing the right thing on its own when something goes wrong. Its requirements are worth the tradeoffs for most of our customers’ applications. It makes managing large database installations easier than any other options we’ve found, and that makes for a happy operations team!

And if you’d like to read more of our musings, here are our other blog posts about MariaDB Galera Cluster: