/ 18-years-in-8-hours

Backing up 18 years in 8 hours

Computer guts

This winter, while home visiting family, I took the opportunity to gather up all of my old hard disks and archive them. This amounted to the majority of my digital footprint for the first 18 years of my life. I’d been putting the task off for a few years, but the chance to explore the data sitting on these old drives (and the cherished computers they came from!) helped motivate this project.

When I was a teenager, whenever a hard disk needed replacement, I’d pull the old drive and shove it in my closet. There they sat, some for over a decade, until I turned them back on last month. Amazingly, across ~350GB of data, only about 500KB proved unrecoverable. Quite fortunate, considering that some of these drives were 15 years old, and one was removed with failing health checks.

In the process of recovering this data, I resolved to preserve it for the rest of my lifetime. Why go to all this trouble? Well, in my explorations of these old drives, I discovered far more meaningful memories than I expected. Old music and 3D renders, chat logs, emails, screenshots, and tons of old photos. The disks from the early 2000s were a reminder of the days when computer use gravitated around “My Documents” folders. Then I learned about Linux and always-on internet access arrived. I took a peek at my first homedir and found all of the little Python scripts I wrote as I learned to work on the command line.

By today’s standards, the breadth and fidelity of these scraps is rather… quaint. A kid growing up today should have a fine pixel-perfect view of most of their digital trail as they grow up. That was another reason this project proved interesting: it was not just a record of how computers changed; it revealed how the ways I used computers and what they meant to me changed over time.


Here’s a brief rundown of the tools and backup process I used, both because they will be useful to refer back to decades from now, and because they may perhaps be useful to others in their own backup tasks:

Archival process

IDE HDD -> USB -> Laptop -> External USB HDD

I used a Sabrent USB 2.0 to IDE/SATA Hard Drive Converter (USB-DSC5) to connect the drives to my laptop. I’ve found this to be a really handy (and cheap!) swiss-army knife for recovering old hard drives, especially since it works on both 3.5” and 2.5” drives. To store the recovered data, I used a 2TB WD My Passport USB Hard Drive (WDBBKD0020BBK-NESN). I’ve had good experiences with these drives in the past, and they have a great form factor. I ordered both items from Amazon and received them a couple days into my trip.

Reading data from the drives

To recover data from the drives, I used ddrescue. This is an imaging tool like dd that will record read errors and exhaustively retry the surrounding areas. Recovering a drive looked like this:

Copy data from /dev/sdc to disk.img (outputting a log of errors to ./disk-log):

$ ddrescue -d -n /dev/sdc ./disk.img ./disk-log

One of my favorite features of ddrescue is that you can re-run it at any point to resume where it left off or try to recover more data. In the initial run, I passed -n to skip the slow exhaustive scraping process, in hopes of getting as much data off the drives as possible in case they stopped working after running for while. Thankfully, no issues cropped up. If there were read errors during the initial sweep, I re-ran the process with a retry count:

$ ddrescue -d -r 3 /dev/sdc ./disk.img ./disk-log

In addition, I saved the partition table and S.M.A.R.T. data separately:

$ smartctl --all /dev/sdc > ./smart
$ fdisk -l /dev/sdc > ./fdisk

With the holidays over and all disks archived, I flew back home with the external HDD in my carry-on bag.

Cold storage in the cloud

Thanks to the advent of cheap cloud cold storage options like Amazon Glacier, Google Nearline, and Backblaze B2, it’s now very affordable to dump a bunch of full disk images in the cloud. I chose Google Nearline for this task. Amazon Glacier is a bit cheaper (Glacier: $.007 / GB, Nearline: $.010 / GB), but retrievals are complicated to execute and price. Backblaze B2 is even cheaper, but only uses a single datacenter.

Before uploading my backups, I was able to shave off ~100GB (almost 30%!) by compressing with lrzip, which is specialized for large files with repeated segments. I also experimented with compressing one of the disk images with xz, but (as predicted by lrzip’s benchmarks) xz took 22% longer to produce a file 10% larger.

After compressing the images, I encrypted them with AES256 using gpg. While I’ve typically used the default CAST5 cipher in the past, for this project I chose AES256 based on this guide. I considered generating a keypair for these backups: my plan was to create copies of the private key encrypted with a couple different passwords given to family members, etc. I decided to defer this because I didn’t fully understand the crypto details and wanted to get uploading, so I ended up symmetrically encrypting the files. I may revisit this later and re-upload with a more granular key system.

Putting it all together, I assembled everything into a pipeline and ran it overnight:

for n in $(ls); do
  pushd $n
  lrzip -vv $n.img
  tar cv $n.img.lrz $n-log fdisk smart | pv | gpg --passphrase="$PASSPHRASE" --no-use-agent --symmetric --cipher-algo AES256 | gsutil cp - gs://$BUCKET/$n.tar.gpg
  popd
done

Waking up to ~250GB of memories neatly packed up and filed away was a lovely sight. I’ve been sleeping better since!

At my friend davean’s suggestion, since lrzip is a less common program, I also uploaded a copy of the git tree to my Nearline bucket.

I also encrypted the files on my local HDD: while I used the out-of-box NTFS filesystem on the My Passport drive for the disk images in transit, once I had a copy of the files in Nearline, I reformatted the drive to ext4 with dm-crypt.

Update: an important final step (thanks to khc on HN for mentioning this): it’s critical to test a full restore flow of your backups before leaving them to rest. In my case, I tested downloading from Nearline, decrypting, un-lrzipping, and reading. Similarly, for my local HDD copy, I tested mounting the encrypted filesystem and reading the images.


Finally, a word of advice when handling disk drives (and other objects you would not like to fall): objects that are already on the floor cannot fall further. Treat any object that is elevated from the floor like it will fall. You can increase the odds of this happening massively by haphazardly arranging your backup drives on swivel chairs and assorted hardware. ;)

/ react-utf8

A character encoding gotcha with React

Tonight I noticed that in a React 0.12 codebase of mine,   entities were rendering as "Â " in Mobile Safari. After a quick search I came across this StackOverflow answer which identifies the "Â " output as a UTF-8 formatted non-breaking space character being interpreted as ISO-8859-1.

To resolve this problem, putting…

<meta charset="utf-8">

…after my <head> element did the trick. While explicitly marking your webpages as UTF-8 encoded has been a best practice for a while now, I learned the hard way today that it’s a requirement when working with React.

Interestingly, this problem was apparent in Mobile Safari on OSX but not Chrome on Linux. This made it present much later in QA. Another good reason not to leave the choice up to the browser!

/ myo-experiments

Fun with the Myo gesture controller

Myo is a wireless armband that uses electromyography to detect and recognize hand gestures. When paired with a computer or smartphone, gestures can be used to trigger various application-specific functions.

When their marketing video made the rounds in 2013, I remember one specific demo made my jaw drop: touch-free video control. The video shows a man watching a cooking instructional video while cutting some raw meat. Being able to pause and rewind the video simply by raising his hand was a solution to an interaction problem I’ve had countless times, such as listening to podcasts while doing chores, or watching videos while eating a sandwich.

I ordered a Myo back in March 2013 and deferred shipment until their consumer design was ready. It was a nice surprise to return home from holiday travels to find a Myo waiting for me. :)

Unfortunately there is no official Linux support yet (though there’s a proof of concept from a hackathon). On Windows and OSX, there’s a pretty elegant Lua scripting environment in the SDK which is used to write “connector” integrations. Lua scripts are selected based on the currently active app to trigger mouse/keyboard actions from gestures. This is a neat approach. It enables developers and tinkerers to do a bunch of the legwork writing and designing integrations, while wrapping the complex parts (gesture recognition / mouse control / keyboard automation) in a cross-platform manner.

I was happy to see some web browser integration already built, but upon further inspection there were a few different behaviors which would be more to my liking. I was delighted to discover that I could simply open up the web browser connector and hack the high-level Lua code into the controls I wanted. I added a gesture to take control of the mouse, as well as some special cases for controlling video playback.

While the gesture recognition doesn’t always work perfectly (probably a matter of training both myself and the armband better), when everything works properly, the results are pretty sublime:

I’ll be posting my scripts and future tinkerings in a myo-scripts repo on GitHub.