28 December 2011

Site updates: new feeds, faster page loads

I’ve had some time in the past month to make some big behind-the-scenes improvements to this site. If you subscribe to this blog, the one you will have noticed today is the new feeds, which will have refreshed all the recent items in your feed reader. Below is a short summary of all the changes.

Simplified deployment with Git

I implemented a new and simpler deployment model for this blog. I moved it from Mercurial to Git, and followed this guide to set up automatic deployments whenever I push updates to the repository on mattryall.net: Using Git to manage a website.

In short, you just add a post-update script in the hooks/ folder inside your repo. Mine looks like this:

#!/bin/bash

BLOG="/srv/www/mattryall.net/www"
export GIT_DIR=$BLOG/.git

pushd $BLOG >/dev/null
echo "Updating working copy in $BLOG"
git pull
echo "Rebuilding site"
./mr rebuild
./mr test
popd >/dev/null

Under the hood, this uses my mr script to rebuild and test the site after an update.

Serving pre-built HTML pages rather than CGI

A few months ago, one of my articles reached the front page of Hacker News. When some 12,000 visitors arrived at my site to read about why wireless networks are slow, serving all these people required more resources than necessary and served the content slower than I would have liked.

My site changes relatively rarely, so I’m switching from generating all the content dynamically to storing static HTML pages on disk and only regenerating them whenever the site content changes. This is one commonly cited difference between the two popular blog software options: WordPress and Movable Type. MT generates static pages by default while WordPress generates dynamic pages.

This required a little bit of work to remove dynamic content from the pages. Everything that is dynamic — like the Twitter and Flickr content on the right-hand side — is now generated via JavaScript from content served statically from the server.

The performance improvement is significant. It can be best seen with the Pingdom results which, once you exclude latency, have dropped from 450ms down to a handful of milliseconds.

You can also see the improved performance in the X-Response-Time header which is served with every response. It shows the time taken to serve the response on the server in microseconds. Here’s a response for the front page which took 562┬Ás to serve from disk:

X-Response-Time: D=562

Setting this up to measure your site’s server-side speed is extremely easy. Just add the following to your .htaccess, as described in the Apache documentation:

Header add X-Response-Time "%D"

There is more work remaining here. I want to use the now-common practice of serving CSS and JS with a far-future Expires header on a single-use URL. I also need to fix a few things around the commenting: the “your comment is being moderated” message is broken, and I want to hook it up so I don’t have to rebuild the site manually after approving a comment.

New Atom feeds

I’ve never really liked the feeds that were generated by this site. They tried to conform to various versions of the RSS specification, but at different times that required a number of weird things.

First, I didn’t want to include author email addresses in my feeds, particularly for comment authors. However, RSS defines the author field as simply being an email address, with an optional name. I’m not sure how exactly, but I ended up with the form ‘noreply@example.org (Matt Ryall)’ for post and comment authors. This works well in some contexts, but not so well in others.

Second, it uses a funny date format. In the words of the specification:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

RFC 822 doesn’t support four-digit years, so in the words of the RSS spec: “All date-times in RSS conform to … RFC 822” — except if you use the “preferred” format, in which case none of them will. Great.

RFC 822 also isn’t a specification that is known for providing a model date-time format for use in other specifications. It’s an early RFC for “Internet Mail” — email. The RFC has almost nothing to do with date formats, and only includes a very short section on it.

The date format itself is also no particularly logical or good. It is ordered illogically, has unnecessary punctuation, is locale-specific because it uses English day and month names, and supports “GMT” as a time zone name but not “UTC”. Here’s an example: Wed, 07 Sep 11 18:24 GMT. This format is hard to parse and harder than necessary to produce.

All this is frustrating because there has always been a significantly better option available. ISO 8601, the international standard for date-time formatting, was standardised in 1988, almost 10 years before the first version of RSS was published.

Lastly, some versions of RSS don’t support important metadata like author information and publishing date. To maximise compatibility, I ended up embedding certain kinds of information using alternative XML namespaces like Dublic Core and Atom.

None of this felt right, and none of it felt like I was publishing my content in a form that was easy for clients to consume.

With the move to static content generation, I rewrote the feed building code to generate static Atom files. Atom solves almost all my gripes with feed generation, although it still has some warts around unique ID generation.

The new feeds are: article feed and comments feed. Clients that respect permanent redirects should update themselves to the new URLs automatically.