We recently released moonshine_mariadb, moved our own site and a very large customer’s site to MariaDB’s Galera Cluster, and lived to tell the tale. In fact, we came through just fine and happier on the other side, with better failover and rid of an old piece of software that ended up causing more trouble than it saved. I thought it would be fun to document the whole process, from deciding on MariaDB, to testing the migration and writing moonshine_mariadb and then our operational experience with the cluster so far.
In order to go over all the work that went into moonshine_mariadb and making the migration as smooth as possible, I’m going to split this up into probably three posts:
- The Problem and the Solution
- Moonshine and Vagrant - Testing the Migration Risk Free
- MariaDB and Moonshine - Gotchas and Things to Think About
I’m going to try to do one of these a week until they’re done, and the number of posts may change. Let’s get going with part one!
The Problem
The old piece of software was MMM, which even the maintainers say you should move away from. It was an ingenious way to add multi-master capability to MySQL, but it had its issues. It was extremely sensitive, difficult to get configured just so for each use case and became more and more difficult to justify using. We’d been looking for a replacement for a while, and about three months ago, it became clear that MariaDB was the way the world was moving and provided the best solution to the problem of failover and durability in MySQL-land. MMM was always just OK, but provided something we couldn’t really find a better way to do - failover and durability for MySQL. Because it used master-master replication, you always had at least one copy of the data that was up to date, and MMM took care of failing over in case something happened.
The problem with MMM was that sometimes it failed over, or worse, took both database servers offline, even when there was nothing wrong. That’s not good, especially when availability is your primary goal.
The other problem was that recovering from an “event” was always painful. You had to set up replication from scratch again. Even though we have a set of capistrano tasks to do that all, it was still nerve-wracking to do in the middle of the night under duress (I find most things are nerve-wracking to do in the middle of the night under duress, but that’s why we’re professionals - we do them anyway).
What I really wanted was a MySQL equivalent of how MongoDB does replication and recovery: you add a fresh clean node and it magically gets a copy of the data and then becomes a full-fledged member of the cluster, able to become the primary at any time.
(A hundred DBAs just rage quit this post)
Enter MariaDB and Galera Cluster.
MariaDB is an open source fork of MySQL, maintained by the original creators of MySQL and offers a 100% open source alternative to MySQL. It’s pretty much a drop-in replacement for MySQL. I’m not going to go into the politics of it, but if you’re interested, there’s a great interview with Monty that goes into the history.
Galera Cluster takes MySQL (there are galera replication packages for MySQL and Percona Server as well) and beefs up replication, making it much more like how MongoDB does it. I’m not going to go into the technical guts of it. I’ll leave that to the Codership folks. Needless to say, this piqued my interest.
Basically, given three nodes, you have a cluster where you can send writes or reads to any member and they’re replicated to the others, so you have failover (nodes just drop out of the cluster) and recovery, because if one fails, you can just add it back when it’s available, or add a brand new node and it will get a copy of the data from one of the other nodes.
Why three nodes? Well, when a new node joins the cluster, it needs a “donor” node, which means it stops accepting writes and transfers a snapshot of itself to the new node, then they both catch up and are brought back to “synced” and start taking writes again. When one node is in donor mode, you still have at least one able to take writes (or “Synced”) and reads and your site doesn’t go down. The larger your dataset, the longer this Donor step takes.
Who’s the Boss
The problem this introduces is that your app needs to know which node to send writes to, which sounds like a problem for a load balancer. Thankfully, we have one handy (hint: it’s haproxy)!
Because I default to paranoid about databases, I decided that I didn’t trust spraying writes around the cluster, so I wanted a backend where one node was the primary and the other two were backups and would only take writes if the primary dropped out. I didn’t have the same problem with reads, so I wanted another backend for reads that balanced between all three nodes.
HAProxy now needed a way to know if a node was Synced so it could figure out if it needed to drop them out of the rotation. I stumbled across a great blog post on setting up HAProxy for MySQL and tweaked it for my own evil purposes (which is to check Synced state, which has the byproduct of making sure MySQL is up and running).
Now, if a node is any status other than Synced, it drops out of the rotation and no queries are sent to it.
This means the app needs to talk to the database through HAProxy, but in my tests, that added less than a couple of milliseconds to the round trip, which I think is a fine trade-off for failover, and if you use something like db charmer, you can pretty easily spread reads around to the rest of the cluster by adding a slave that talks to HAProxy’s new read frontend.
So far, this setup has been working really well for both us and the customer we just migrated.
In Part Two, I’ll go over moonshine_mariadb and how we developed it by testing the migration using a local vagrant stage (over and over and over again).