Don’t lose your data…

A bit of a different subject to normal posts. I’ve seen a lot of tweets recently from people who have lost irreplaceable data because they haven’t got a backup or their backups weren’t working properly.

Bruce Schneier recently said on his blog:

Remember the rule: no one ever wants backups, but everyone always wants restores.

This is the truth – it isn’t the backup that matters, it is the restore. You need to test it. If you are serious about your data, back it up!

I had a scare last year when my laptop’s SSD failed without warning, and then I found out my backups hadn’t been working properly. Luckily my elite data recovery skills meant I could get the data back.

I took this as a chance to implement a robust, dependable backup system that I knew I could rely on.

Goals

You need to decide what you are protecting

  • Photos – these to me are genuinely irreplaceable.
  • Projects – code, notes, datasheets, data etc. I could redo these, but it would take time and effort.
  • Emails – again, I would have no way of recreating these

And what you aren’t:

  • Media – TV, films, music. I’m not bothered about these – I can get them again
  • Programs and OS – I can download these again.

At this point I should say that I am not a fan of “bare metal restore” or full disk imaging. Why?

  • Individual files are not easily accessible – it is far harder to determine if things are working correctly.
  • The file formats are often proprietary and undocumented – if it isn’t working, I am going to have a hard time fixing that.
  • Bare metal restores are difficult onto different hardware – they don’t handle changes well, even a different sized partition complicates this.
  • I would hope I need to restore infrequently enough that re-installing my OS and programs is a welcome clean-out rather than inconvenience.

You need to decide what you are protecting against:

  • Disk failure – this seems to be the biggest threat to my data. One external HD and two mSATA SSDs have failed in the past two years. My view is now that no single storage device can be trusted, especially SSDs
  • Theft – my laptop, iPad, server or backup drives could be stolen.
  • Idiocy and mistakes – I could delete something I didn’t mean to at any point in time. Or simply change something I didn’t mean to.

It would be fair to say, my solution is belt and braces and then some.

Central storage

Instead of trusting my data to my individual mobile devices and backing those up, the primary store of data is on a central server located in our house.

This is a HP N40L server (which are often available for £100 with a cashback offer), running with 2x3TB drives in a RAID1 configuration. RAID1 is otherwise known as “mirroring” and I have implemented it in software (which means that I can put the drives in any machine, unlike with hardware RAID where the chipset must be the same). All RAID1 does is protect against drive failure – nothing else. If the machine is stolen, I lose my data. If I delete my data, I lose my data. Don’t fall into the trap that many do and call RAID1 a backup. I have done it for convenience and because these large drives are currently unproven in terms of reliability.

Although this is the primary store of data, I need to be able to work with this data quickly and when away from the house. Therefore everything is synced between the server and mobile devices periodically.

For Windows machines, I use SyncBack Pro to do this in near-realtime. It’s very effective and bi-directional.

Central storage backup

I run two of my own backups on the central storage.

Firstly, on a daily basis, an incremental backup is performed between the 2x3TB RAID1 array and an external 4TB USB drive. The incremental backup means I have 90 days of history on all of my files available immediately. The external USB drive means that there is a degree of isolation between the server and drive, and I can quickly remove it from the house if need be.

Secondly, at the beginning of each month, I plug in a second external 4TB USB drive. Again, this is an incremental backup, but less frequent. I then remove the drive and store it in my substantial safe. This protects me against hardware failure – even if the server decides to send 240V into all connected devices, this drive is not connected to the machine all of the time. It also protects me from theft and fire to a degree – only a determined burglar could open the safe.

Both of these use SyncBack Pro as well.

Offsite central storage backup

The entire central server is then backed up to the cloud using Crashplan. The most important feature of Crashplan is that it is offsite. Whatever happens to the hardware in the house, Crashplan will have the data.

Crashplan also allows friends and families to backup to my server and take advantage of all the other backups I perform.

Once a year I backup photos to a portable USB hard drive and give this to a trusted third party (parents) to look after.

Offsite laptop backup

Not content with that, I run Backblaze on my personal laptop. Backblaze is a competitor to Crashplan. This backs up everything on the laptop to the cloud.

(I’m not actually quite this paranoid – I used to use Backblaze on our old “server” running Windows 7. When I upgraded to the HP N40L, I found Backblaze doesn’t run on Windows server OS, so had to switch to Crashplan. I have another 18 months of Backblaze subscription left to use).

Dropbox and Github

The final aspect of backup is for all of my project work. All of it is on Dropbox. This isn’t primarily for backup – it is for access from wherever I want. All of my code goes onto Github.

Encryption

A number of the devices mentioned above are encrypted using Truecrypt. A number of more sensitive documents are encrypted before being sent to the cloud.

Testing

I regularly check the above is all working. I recently had an SSD failure, and initially noticed that 1 of the above mechanisms wasn’t working. It was quickly fixed.

Conclusion

This might be paranoid, but all this data is vital to me.

My photos, at the moment are stored:

  1. On my laptop
  2. On the RAID array in the server
  3. On the permanently connected USB drive
  4. On the once-a-month USB drive
  5. On the offsite portable USB drive
  6. On Crashplan
  7. On Backblaze

The chance of all of this going wrong at the same time is virtually zero.

2 thoughts on “Don’t lose your data…

  1. Permalink  ⋅ Reply

    Matt Gorecki

    June 30, 2013 at 10:16pm

    The chance of all of this going wrong at the same time is virtually zero.

    In my experience, the biggest problem isn’t the data outright disappearing, but rather the backed up data being corrupt.

    • Permalink  ⋅ Reply

      cybergibbons

      July 1, 2013 at 11:21am

      Most of the methods used above use some form of verification and history, so hopefully unless I don’t notice corruption of the source, it should all be fine.

Leave a Reply to cybergibbons Cancel reply

Your email will not be published. Name and Email fields are required.

This site uses Akismet to reduce spam. Learn how your comment data is processed.