Tuesday, July 25, 2006

A Couple of Random Things

Just a couple of random, unrelated items.

You can now get Google maps with traffic information on your cellphone. Their service currently covers 30 cities (for the traffic info, I imagine the mapping features work most anywhere). I tried it out on my Nokia 6682, and it's pretty cool. A little slow, but cool. You can view a zoomable map, current traffic conditions and directions from point A to point B. Really nice if you don't have a GPS in your car.

Also, as a follow up to my previous post, I'd like to pre-announce some vaporware I'm calling s3sync. It's actually coming along pretty well, and I've been backing up files like crazy. Essentially, it provides a some of the capabilities or rsync, along with some additional features, like automatic file compression (when possible) and file encryption. It definitely needs some more tweaking, and it's not yet feature-complete, but the actual bare-bones file backup operations are functioning. I hope to have an initial release available around the end of July or beginning of August. If I were a betting man, I'd go with sometime in August. :)

I'm still really psyched about the Amazon S3 solution, but had really mixed results with Jungle Disk. My chief complaints with Jungle Disk are:

  • It requires you to run a GUI, whereas I'm really looking for something command-line oriented, and thus easily scriptable via a cron job.

  • It hogs the CPU in Linux. According to one of the forum posts, the author has identified the bug, and this may actually be fixed in the latest version -- I haven't tried it.

  • Attempting to do something like rsync is very kludgy. And rsync is basically what I want.


So, I decided to roll my own. I'll post updates as they become available. :)

Saturday, July 15, 2006

Stupid Blogger Tricks

My posts vary in length quite a bit. Sometimes they are really long-winded tirades, and other times they are just short meaningless outbursts. One thing that always bugged me about blogger was that there didn't seem to be any easy way to create a post that showed only the intro, and then a link you could click to see the full post if you were interested. I found some solutions in the blogger FAQs but didn't find them satisfactory. If you're looking for the same kind of thing I was, read on.

Searching blogger's help I could only really find two ready made solutions: this one and this one. The second solution was closest to what I wanted, and I used it for awhile, but there were two things I didn't like about it. First, the "Read Full Post" link would have to appear on every post, no matter how long or short it was. As you see near the end of the article, the ability to selectively apply it to individual posts was left as an exercise to the reader. Nice. Second, it required you to load a completely new page. I really wanted something that would just hide the text and expand it inline.

Well, one night this started really irritating me, and I finally came up with the following solution. It may not be optimal, but it works and does what I want it to. Change your blogger posting template to something like the text below. Then, when you create a new post, you edit what's there by putting your introductory paragraph, summary, or whatever where it says "Intro goes here ...". And, as you might guess, put the rest of your post in the section that says "Extended post goes here ...".

The only other thing, which admittedly is a bit of a nuisance, is that you should change the two instances of "postname" to some unique identifier for that particular post, so that you don't have overlapping spans.

I know this isn't exactly genius, especially for anyone who's done a lot of hand-coding of HTML, Javascript and CSS (that does not, unfortunately, describe me!). Sometime when I run out of other things to do, I'd like to tweak it a little, for example, I'd like to make it a toggle instead of having it completely disappear.

I thought I'd post it anway in case some other frustrated blogger out there might find it useful.

Example posting template:

First, add this somewhere in the style section of your template (the main template, not the posting template):

<style>span.hiddenspan {display:none;}</style>


Then edit your posting template to look something like this:

Intro goes here ...
<span id="postname" class="hiddenspan">
Extended post goes here ...
</span>
<a href="#anchor" onclick="javascript:{document.getElementById('postname').style.display='inline';this.style.display='none';};">Read Full Post</a>


Read Full Post

Sunday, July 09, 2006

Backup Strategies

Since my recent conversion to running Linux (Ubuntu, to be specific) as my desktop OS on my notebook full time, I've been playing with various options for backing up data.

Since I have separate Linux desktops both at home and work, rsync is a natural first choice. It comes as a standard part of the OS and lets you easily mirror a copy of your data. One of the best features is that when you run it to mirror your latest changes to the backup copy, it doesn't have to actually copy your entire dataset - only the differences. This makes the process much faster, and also makes it more likely that you'll make backups frequently.

All this requires only a single command (assuming that you have previously set up SSH to use public key authentication):

rsync -avz -e ssh ~/e/work eric@myworkbox:~/backups/laptop/e/work

In my case, I have a directory under my home directory called e which contains all the data I am concerned with backing up. Under e is another level in the hiearachy, a directory called work, which contains all of the work-related data I wish to backup (source code, documents, etc.). This makes it very easy to make a backup mirror at work that contains only work-related stuff, and none of my personal data. When I get home, I can run a similar command on the e directory which will back up both my work and personal data to my other personal Linux box.

OK, now we're cookin'. But one shortcoming of this solution is that the only thing you can restore is whatever is in the last backup copy you've made. For example, let's say I have a source file foo.cpp that I'm working on. I haven't checked it in to the revision control system because it's still a work in progress. Yesterday I had a bad day and I deleted the file, but I didn't realize it until today. Unfortunately, I ran my backup script at the end of the previous day, and it dutifully deleted my backup copy of the file to keep the mirror in sync. Bummer.

I stumbled across this nifty little script which adds a level of versioning to your backups. It's just a bit more complicated than the one-liner above, but it's worth it. It still keeps an exact copy of your data on the target, but it also keeps a version of your files which changed for each day of the week. So in my scenario above, when I ran my backups the previous day, it would have removed the file I deleted from the full mirror copy, but also saved a copy of the file in the special daily backup subdirectory.

This solution is an improvement, but still not perfect. For one thing, the assumption is that you are only going to run your backups once a day. If you run it on a Tuesday, and then run it a second time the same day, it will see the "Tuesday" directory and assume it was from last Tuesday, delete the contents, and archive off only the differences between the first and second backup from the same day. Also, seven days is the farthest back in history you can ever go back.

So the solution I'm currently using is a utility called rdiff-backup, which I read about in Sys Admin magazine. This solution required a bit more legwork, as I had to satisfy some dependencies it had (Python, librsync and zlib), but it was worth it. It still uses rsync at its core (well, techincally, the rsync libraries, not the standalone utility), so you aren't copying all the data each time. But the additional, wonderful benefit that it offers is that it makes "diff's" of all your files, including addition and removal of those files. So now, I have a small script that runs the following commands:

cd /home/eric/e
rdiff-backup -v5 work workbox.ericasberry.net::backups/laptop.work
rdiff-backup -v5 --remove-older-than 60D workbox.ericasberry.net::backups/laptop.work


The first rdiff-backup is what actually backs up the data. The second command tells rdiff-backup to remove from its repository any file revisions older than 60 days. It's a design limitation of rdiff-backup that requires the removal of older versions to be run as a seperate step. Fortunately, it executes very quickly.

I have to admit that while the backup process is super-simple once you create the script, the restore process can be a little cumbersome. This is especially true if your not exactly sure of the time and date of the specific revision you want to retrieve. It's also completely a command line proposition, so if working at the shell level intimidates you, it's probably not the solution for you. I think at some point I'm going to try cobbling together some kind of GUI front-end to make this process a little easier, unless I find someone else has already beaten me to it.

So, overall, I'm pretty happy with this solution. The only thing that worries me is the case of a disaster - say a fire or robbery where I end up losing all my computers. Replacing the hardware would be enough of a headache, but my backup efforts (at least for my personal, non-work related data) would be in vain, because I wouldn't have any copies of those backups in the disaster scenario. Obviously, the ultimate, secure solution would involve off-site backups.

There are a few possible alternatives here here. First, I could just mirror my personal data in addition to my work data to my machine at work. There's plenty of disk space available, but I just don't feel comfortable doing that. Probably a bunch of smack talk in the employee handbook of using work resources for personal use anyway. ;) So, scratch that.

Another solution would be to archive my data to a CDR or DVDR and keep that disc in a fireproof safe or a safe-deposit box. Well, couple of problems here. First of all, I have so much data (when you include things like photos, digital music, etc) that it wouldn't all fit on removable media. So I'd have to backup only a small subset - my most critical data. Second, it would simply be too much of a hassle. I know I'd probably do it once in awhile, but suspect I would not be disciplined enough to do it weekly, much less daily.

The final solution is to use one of the many online backup solutions that are available. I've spent some time researching this but a few things have made me hesitate.

First, most of them require that you use some kind of proprietary backup software which they provide, nearly all of which run only on Windows (though I've found a few that will also run under OSX). I want something that will work in Linux, preferably with rdiff-backup. I could probably work around this by setting up Samba shares, etc, but I'd really like to keep the process all contained on my server box.

Second, in general you either have to trust that nobody is going to snoop on your data on the third party's server, or you have to encrypt the data yourself before you make the backup copies. (Maybe I'm just paranoid). I could set up something with gpg to automatically encrypt the data, but I believe this would wreak havoc with rsync/librsync, and would most likely add a lot of time to the process.

Finally, and probably most importantly, most of these services are just way too expensive to use with large amounts of data. (Not that they are cheap even with relatively small amounts of data!) I believe, however, that I've found a pretty good solution that I'm going to begin experimenting with over the next few days: Amazon's S3 in conjunction with a backup utility called JungleDisk.

This looks like a great solution because Amazon's pricing for data storage/transfer is just pennies per gigabyte. JungleDisk is available for Windows, Mac and most importantly, Linux. By installing another package called DAVfs, you can actually treat your remote backups as part of your regular filesystem. (This feature apparently comes for free if you're running under Windows or OSX). Not only that, all of your data is automatically encrypted. Nice!

The only drawback I've found so far is that JungleDisk can't be run as a daemon. There is a GUI interface that you must launch to start it up. That's not really much of a problem, assuming the software is stable, as I can just start up an X session with VNC Server, connect with the VNC client, start up the GUI and disconnect the VNC client and leave JungleDisk running merrily away in that headless Xvnc session. I can always reconnect to that Xvnc instance later if I need to restart or tweak JungleDisk.

Well, I hope this post helps someone out there looking to devise their own backup strategies. If nothing else maybe it will provide some ideas to to further explore.

Read Full Post