Blogs

EMAIL: info@example.com

Drupal.org blog: How Drupal.org maintains geo-redundant remote backups with ease thanks to rsync.net

The following case study was written collaboratively by hestenet and nnewton, explaining how we use rsync.net to manage backups for Drupal.org. The Drupal Association used rsync.net for many years prior to any partner relationship, and is now proud to count rsync.net among our Technology Supporters.

Drupal.org has been the home of the Drupal community for many years. Online since 2001, and fed by a global community of contributors, there is a tremendous amount of open source history recorded here.

It’s critical that we safeguard that history for posterity, and of course all of our current activity so that we can maintain the momentum of the Drupal project.

Naturally, we’ve done a tremendous amount of work to make our infrastructure robust and fault tolerant from the top of the stack to the bottom. Individual servers use RAID storage, our infrastructure is built using highly-available pairs, and the Oregon State University Open Source Lab, our data center, has good data center hygiene and redundant power and cooling.

But disasters can and will happen, and this is why off-site backups are critically important.

Drupal.org uses rsync.net to manage off-site backups, and we highly recommend them as a solution. rsync.net is built on ZFS, a file system we have experience with and trust to be durable and offer cheap, immutable snapshotting. rsync.net gives you an empty filesystem to do anything you want with and works with any SSH or SFTP based tool. This standard approach allows us to easily use the service with existing tooling. We have used rsync.net for various purposes for almost ten years and have not had a single incident.

How exactly do we use rsync.net

rsync.net is actually configured as our primary backup location for all of the Drupal.org infrastructure. In addition to this, because we take advantage of rsync.net’s geo-redundancy feature, rsync.net provides those backups in multiple, separate data centers. We also use rsync.net for a secondary backup layer for some select data pools that are already backed up in Amazon S3, or on the Open Source Lab’s backup servers.

How do we have it configured?

For the Drupal.org infrastructure we use BorgBackup (https://www.borgbackup.org) to manage compression, encryption, and deduplication of our backup data. We then entrust rsync.net with ZFS snapshotting of the Borg data, providing us with points in time to easily roll back to. This gives us a sliding window of encrypted backups. It also gives us protection from malicious actors or ransomware as the 

rsync.net snapshots are immutable, or read-only.

The actual execution of borg and tracking of backups is done using a bash script that is placed on each server by our Puppet tree. Puppet also places a private key for encryption and the appropriate ssh private key for rsync.net access. We wrap Borg in a script due to our need to cleanly initialize new vaults when we spin up a new server, as well as our need to monitor Borg execution. One thing we have found is that it is difficult to detect silent failure for Borg specifically, so we have multiple points of feedback in the script. Our script functions as follows:

  1. Check if a vault on rsync.net exists for this host, if not create it.

  2. Backup the paths passed to this script to said vault for today’s date.

  3. Check the return of the last command, if it has failed email our monitoring endpoint to trigger an alert.

  4. Use Borg to pull statistics from the last backup, such as count of files backed up, chunks backed up, size of backup, etc.

  5. Massage those statistics into usable metrics and send them to statsd, where they will end up in our monitoring system.

With this process we both have an alert in the time of failure and also can create alerts based on the graphs we create from statsd. We do this to catch times when a backup may have succeeded but the amount of data backed up dramatically fell. That “success” may have been a failure in that case, just not an obvious one.

Example dashboard built from statsd information:

A major reason we value rsync.net is it presents a simple/standard ssh interface that allows us to use tooling we can customize to exactly what we need, as in the above example.

Why would we recommend this to others?

We share the Drupal community’s love of simple, elegant, and technically excellent solutions to problems. Configuration and backup management with rsync.net ticks all of those boxes, and further the business-side is run in a very friendly way, with frequent increases in capacity available at very reasonable rates.

It’s proven to be an effective and affordable way to use the funding we receive from the Drupal community to protect the project, and we believe you can trust it to protect your own projects as well.