Roll Your Own Cloud Backups with Arq and B2

posted in: June 2018 | 0
Photo of cloudsPhoto by Glenn Fleishman

42 comments

Roll Your Own Cloud Backups with Arq and B2

It’s surprising Apple still doesn’t offer iCloud backups for macOS. Time Machine requires a separate external drive or partition, making it feel long in the tooth. And it doesn’t help that Apple just killed the Time Capsule (see “RIP: Apple AirPort, 1999–2018,” 27 April 2018). Despite Apple’s commitment to iCloud and the availability of up to 2 terabytes of storage, the company offers no set-and-forget backup option for the Mac. It’s a bizarre omission because Apple has every other piece in place to make it an offering.

Paid cloud services can readily fill this gap, such as Backblaze (a TidBITS sponsor), but you can also now roll your own cloud service at a reasonable price by combining Haystack Software’s Arq backup app for macOS with Backblaze’s B2 on-demand, usage-based cloud storage service. I reviewed Arq for Macworld in March 2017, and found it generally good, although it needs more refinement in its restore process; Arq added B2 support a year ago.

Backblaze B2 competes with Amazon’s Simple Storage Service (S3) and Google Cloud Storage, the two biggest similar firms in the space. All cloud storage companies regularly lower their prices, and a recent price drop from B2 now makes it a reasonable option for your own backup.

This article provides a roadmap for how you can roll your own cloud backup and not give up anything in the process. Expect more options to arise in the future.

Why Build Your Own Solution?

Cloud-based backups predate even the term “cloud” for distributed online storage. Mozy was one of the first in 2005, and Code42’s CrashPlan followed in 2007. (Code42 is in the process of exiting the personal backup business, see “CrashPlan Discontinues Consumer Backups,” 22 August 2017.) The advantage in the early days was not having to manage a server, pay for specific amounts of storage, or find software reliable enough to transfer data routinely and automatically.

The rise of on-demand, usage-based cloud storage and its precipitous price drop since Amazon S3 first appeared make it possible to consider the benefits of rolling your own cloud-backup solution. That would let you control the entire backup process, paying only for ongoing archival storage and downloading data when you need to restore files. Plus, you could manage the security of your archived data through client-side encryption, an area of increasing concern.

Arq makes all of this feasible, and I’ll explain how to set it up in the how-to section below. But first, where should you store your data?

The Best Storage Option for Your Money

Currently, B2’s pricing is cheaper than similar storage from Amazon S3. With its recent price drop, B2 now charges $0.005 per GB per month for storage, and charges only for downloads at $0.01 per GB transferred, which occurs almost entirely when you’re restoring files from a backup. (It’s free to upload data.) Amazon and Google have tiers of service. Their standard “fast access” tiers cost much more than B2 for storage and retrieval, and while their deep-storage options compete more closely with B2, they can still wind up being more expensive for storage or retrieval on restores. (I went into excessive depth about these tiers in “Investigating ChronoSync 4.7 for Cloud Backup,” 22 December 2016.)

B2 support has only recently become widespread in macOS software, which means price and opportunity finally intersect for many users. If you could limit your total archive to 1 TB, you’d pay $5 per month in storage ($60 per year); at 5 TB, that’s $25 per month ($300 per year). For a single machine, most unlimited hosted backup services will be as cheap or cheaper, but for multiple computers, rolling your own could cost less or about the same, as you’re only paying for the total data stored among all your backups. Restoring data costs $1 per 100 GB, so a typical restore won’t cost much.

If you have a lot of data to restore relative to your broadband connection, Backblaze is testing the B2 Snapshot Return Refund Program, which will charge you the standard download fees and then ship you a drive for a refundable fee ($99 for up to 128 GB; $189 for up to 4 TB) and return shipping costs.

You could also save money by using a sync or storage service that you’re already paying for and that has unused capacity:

  • Dropbox: Dropbox’s lowest-tier paid service includes 1 TB of cloud storage, and Arq can talk directly to Dropbox’s API. You can use Dropbox’s Selective Sync or Smart Sync to prevent those backups from being unnecessarily synced to a desktop computer.
  • Amazon Drive: If you’re paying for 1 TB or more on Amazon Drive, you might have hundreds of gigabytes available, and Arq can store files directly there.
  • Server: If you happen to have a real or virtual server at a data center with spare storage and data transfer capacity, Arq lets you transfer via SFTP.

With these prices in mind, let’s look at how to make this happen.

Set up Your B2 Account

Start by creating an account for Backblaze B2 and obtaining the credentials you need:

  1. Visit the B2 signup page and sign up for an account. (I highly recommend enabling two-factor authentication when prompted.)
  2. Backblaze includes 10 GB of storage for free, but fill out the Billing section if you want to store more than that immediately.
  3. Click Buckets on the left, and then click Show Account ID and Application Key, which you’ll need to plug into your archiving app—Arq, in this case.Screenshot of B2 account navigation interface showing B2 Cloud Storage Buckets page and a bucket
  4. In the Account ID & Application Key screen, click Create Application Key. Screenshot of B2’s Account ID & Application Key screen
  5. Copy both the account ID and the application key, and store them securely, Someone might be able to derive your account ID, but wouldn’t be able to access your stored data without the application key. (Encryption, as described below, also helps protect your data.)
  6. At this point, you can either choose to create a “bucket,” or you can do it in Arq.

What’s a bucket? You can think of it as a folder in a cloud-storage system. Unlike a folder on your Mac’s drive, every bucket name has to be unique across the entire cloud system! Your backup software can generate one randomly, or you can smash down on the keyboard to create one.

With a B2 account in hand, let’s configure Arq.

Configure an Arq Backup

Arq has a one-time $50 license fee—it includes perpetual updates—and offers a 30-day trial, so you can experiment with it before being locked in. Arq can back up folders or entire volumes from internal or external drives attached to the computer on which Arq runs, or from mounted network volumes, avoiding the need for an Arq license for each backed-up computer. Be aware that it has a stripped-down interface, which doesn’t look much more advanced than a screen-based terminal app, but it’s fairly powerful within those parameters.

To set up your backup, follow these steps after launching Arq:

  1. Choose Arq > Preferences.
  2. Click the plus (+) sign in the lower-left corner.
  3. Select Backblaze B2, and click Continue. (The “Which destination is best for me?” help that comes up offers good price comparisons.)Screenshot of Arq’s Preferences dialog.
  4. Enter your B2 account ID and application key that you set previously, and then click Continue.
  5. At this point, either name a bucket at this stage—see details above about limitations—or use one you’ve already created. Then click Continue.

Every destination uses the same parameters for encryption (see step 3 below), schedule, budget, and scripts. You can modify all but the encryption parameters by selecting the destination in Preferences, clicking Edit, and setting the options in the Schedule, Budget, and Before and After Backup tabs.

The Schedule and Budget tabs for the B2 backup destination.
Backups (left) occur hourly by default, but you can refine these settings on a per-destination basis. Budget (right) lets you set a maximum size (B2 and others) or cost (Amazon S3 and others) beyond which archives are trimmed.

The Budget tab is the most interesting one for managing costs. You set a maximum total size for backups, and as long as it’s larger than a complete set of your files, Arq automatically thins older archived files to keep you within that amount. (It always maintains a single full set of all files, no matter the budget.) With Amazon S3 and others, you can set a maximum monthly dollar amount, which Arq calculates based on Amazon’s rates. Arq can also remove “unreferenced,” or locally deleted, items every 30 days or at a rate you set.

Next, you use the main Arq window to add the folders you want to back up:

  1. Under Configure Backups, select To B2, and then click Add a Folder to Backups.
  2. Select a folder. With your startup volume, I recommend picking folders like your home folder and the Applications folder individually to avoid backing up system files and logs. For external drives that don’t have system files, you can select the entire volume as a “folder.”
  3. When Arq prompts you, set a passphrase for encryption; I’ll explain more about this later. Use a password manager like 1Password or LastPass to generate and store a relatively long passphrase, like 15 to 20 characters. Do not do this by hand because the passphrase cannot be recovered if lost. Arq creates a local file that contains necessary encryption details.Arq’s password-entry dialog boxA warning reminds you to write down the password, which is poor advice in the modern age—use a secure password manager. The dialog has a single button labeled “I Wrote It Down,” which is unhelpful because at this stage you’ve already set the password and can’t back up to change it.
    An unhelpful dialog telling you to write down your password.
    Death to sticky notes!
  4. Click the folder under To B2 and then click Edit Backup Selections to modify which folders and files Arq will monitor for changes.

As I noted earlier, restoring files is not as simple as setting up backups:

  1. In the Restore Files section, click From B2, and select your computer.
  2. Underneath, in the list of backed-up folders, click one of these items to expand it.
  3. Under the expansion, where Arq lists a snapshot for each archived operation, select a snapshot.
    Screenshot of Arq’s interface for restoring files.
    Restoring files in Arq requires a lot of drilling down to backup sets and then individual files.
  4. In the list of available files on the right (which includes the last modification date for each file), select a single item and either click Restore or drag it to a location in the Finder.

Unfortunately, you can restore only a single file or folder at a time; there’s no provision to make multiple selections at once. If there are conflicting items at the restore location, you’re prompted with Do Not Overwrite, Cancel, and Overwrite. But Do Not Overwrite doesn’t selectively replace files. Instead, it creates a nested local directory with the full set of restored files.

Arq offers options for overwriting files on restore.
Arq’s Overwrite warning is more complicated than its explanation.

Handling Encryption in Arq

Arq uses its own encryption system, relying on standard libraries. Arq’s developer, Haystack Software, documents it fully on its Web site (in a text file!), and notes that it uses an encryption approach similar to the one used by the Git file-versioning system.

Arq transforms your passphrase into a number of encryption keys, which are stored in a local file that’s encrypted directly using your passphrase. While Arq is in use, it keeps the encryption keys available for itself, which is true for all backup software with client-side encryption and decryption.

The encryption keys are never transmitted to a server in any fashion, which is the best behavior if you want the highest level of control over your archived files, and the least possibility that any unwanted party—personal, criminal, or governmental—could gain access to those files.

Backblaze’s consumer backup solution keeps your encryption key private until and unless you have to restore data, at which point it has to be transferred to the company’s servers to decrypt archives and create a downloadable Zip archive of your restored files. It’s not stored permanently, but it’s a point of weakness for someone who could gain privileged access, and one not found in SpiderOak or CrashPlan.

Google and Amazon’s cloud-based server systems also allow encryption, but they encrypt and decrypt on the server side with a user-provided key, so the key ends up out of your control even though the process is designed to be secure.

Arq’s only encryption problem is that its passphrase-entry approach isn’t integrated with anything else, so you must retain a copy in some secure fashion, such as with a password manager like 1Password or LastPass. Haystack Software should consider adding integrations.

Why Not Other Backup or Sync Apps?

You may wonder why I don’t discuss two other popular file transfer apps that support B2 and other cloud services, SFTP, and other connection methods.

  • ChronoSync by Econ Technologies ($50 perpetual license, 15-day trial) is a terrific clone, mirror, sync, and archive app that keeps getting better. Unfortunately, it doesn’t offer any client-side encryption options. ChronoSync can use Google and Amazon’s server-side encryption (see above). If Econ Technologies added client-side encryption, it would be a strong competitor to Arq.
  • Panic’s Transmit 5 also supports cloud-storage systems like B2 and synchronization, but it lacks scheduling, restoring, and archiving features necessary for a backup solution. It doesn’t offer client-side or server-side encryption.

The Future of Rolling Your Own Cloud Backups

I still wish Apple would provide an iCloud-based backup service, not to put other companies out of business, but to provide a minimum level of archiving that would be easily and affordably available. That would teach everyday users that cloud-backup solutions exist, which could grow the market for independent backup services with more to offer.

Arq and B2 aren’t the perfect combination, but they’re the best option that I’ve seen to date for a combination of control, archiving features, and price. I expect we’ll see more, between CrashPlan’s exit from the market, the growing interest in controlling one’s own encryption, and the drop in cloud-based storage pricing.