Replicating your database on 3 machines
At Hotels.ng, we use 6 servers - all are fully dedicated, baremetal servers. That means we get the pleasant task of managing everything on the servers. It saves us a lot of money, but for that, we really need to go deep into the infrastructure.
Sometime in November last year, one of our servers died. It had been running along for 4 years without a problem. Then one of the SSDs died - of course, in the middle of the night.
I woke up to many messages on my phone, lots of emails, etc.
It initially looked like an easy problem to solve - we could still log into the server. But it was not, because even though we could log in, disk writes were unreliable.
After a bit of struggle to get it back up, we realised that we needed to move the server. After a server runs for 4 years, nobody really knows the intricacies of what is happening anymore - and the team that set it up, had over the years been replaced by a completely new team.
So we went through an 18 hour ordeal to get everything working again. I’ll write about that sometime in the future. What I want to focus on now is one thing - how we ended up replicating our database for good redundancy on multiple servers - to avoid such situations.
There is quite some detail to it, but on the high level it’s quite easy:
We have a dedicated database server where all the writes go. We have two replica databases on two separate machines - one is used for read operations, and the other is used for ‘crunching’ - i.e anything that will tie up the database intensely for hours (like indexing hotels that are served by Sphinx).
Core Database Machine
The main db server is a single, powerful machine. It has two 1TB hard drives in Raid-1 config (data is mirrored). It also has two 1TB SSDs in Raid-1 also. The OS and the DB are installed on the Hard Drives, but the DB data is on the SSDs.
For monitoring and alerts, we use Prometheus, Lorona and New Relic.
All the services that do write operations connect directly to this database and write to it. We have CPU usage of about 30% on average on the machine.
Read Replica Machine
We have a read-replica machine that is setup as a replica of the main db. Anything that gets written on the first server gets replicated here with a delay of a few seconds. This server picks up some of the read load from the core DB. It also acts as a live backup of our core DB - if anything happens to the main machine, we can instantly jump to this read replica and use it.
We have a 3rd replica on another machine that is used for all number crunching operations. For example, we run reports that can lock up the DB for up to 5 minutes - or when we re-index our Sphinx search. All these operations would lock up our core DB and prevent it from working, so they are done on the workhorse. Additionally, this machine gives us a 3rd level of redundancy.
Backups happen on the replicas. That is because as your DB grows, backing up can lock your DB for quite a while. So it’s better to do it on the replica. The backups are daily, and operate like this:
A cron script creates an export of the data from the drive where the data is to a second drive where the data is not. That means on the same machine, you have two different drives that each have a copy of the data. So on the same machine, you have basically 4 different copies of your backups (each drive is mirrored, and you have a copy.) If anything happens with one set of drives, there is a copy of the data on the other drive-pair.
After this export, it uploads the backup to Amazon S3 with the Glacier option turned on. This makes it very cheap to store a full backup of the system, and gives cloud redundancy.
So in total, we have following copies of the data:
- One copy on the core machine
- Two copies on the read replica
- One copy on the workhorse
- One copy in Amazon S3
Additionally, I do a backup to an offline system every 6 months.
How to setup such a system
It takes a bit of research to set up a system like this, but once you know it, it’s pretty easy. Here are my notes on how to do it:
First of all, all machines should be setup and provisioned. In our case, we use dedicated machines that are in different data-centers (you never know if the data-center will burn down).
After that, install MySQL on all the machines. On machines with an SSD/HDD pair, you need to change the data directory saving location. Here is a handy guide: https://www.digitalocean.com/community/tutorials/how-to-move-a-mysql-data-directory-to-a-new-location-on-ubuntu-16-04.
On the Core database, you need to set it up as the ‘Master’. Here is a guide: https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql.
On both of the other machines, you need to set them up as slave. The guide above also explains how. What is quite important is that you need to lock your database, and get the ‘Position’, and then export the data before unlocking. This way, you can import that data into the replica db, and the replication will start at the position you captured.
When exporting your data, you need to be careful not to use the all-databases option, but instead to select each db you want exported. This avoids you overwriting the system dbs.
At the end of the process, you will find that the replication will be too slow to be useful. Use this guide to speed up the replica: https://thoughts.t37.net/fixing-a-very-lagging-mysql-replication-db6eb5a6e15d.
A small note, with modern MySQL, you will likely need to turn on the options
- gtid_mode = ON
- enforce_gtid_consistency = ON
That is it. It’s pretty straightforward to setup multi-server db replicas with MySQL!