12 August 2011

Self-hosting outages, monitoring with Pingdom

The experiment of hosting my blog on Linode has had a few hiccups in the past week. There have been two significant outages of my site.

First, on Sunday 7 August at about 23:00, Linode’s hosting facility had a power outage that took out all the servers in their Fremont data centre. The site was down for approximately 100 minutes. The power restoration took about 45 minutes according to their event timeline, then another 55 minutes for my Linode to come back online and start serving pages again.

Second, yesterday afternoon, Thursday 11 August at about 17:30, there was another outage for about 90 minutes. This time, it was a hardware failure on the server my Linode runs on (shared with perhaps 30 other nodes) where a disk needed to be replaced. My Linode was migrated to another machine and the support team notified me of this within a few minutes.

In both cases, Pingdom alerted me to the outage faster than anything I received from my hosting provider. I use their free service which allows you to monitor just one site (this one), and configure it to check my home page here every five minutes. Here’s their summary of my site:

Pingdom also measures the responsiveness of the site from various places around the world. You can see a chart of this below.

This information will prove really helpful once I get some time to optimise the performance of my CGI scripts, perhaps by replacing them with static files.

In terms of the self-hosting experiment, I think both the outages were unlikely accidents and I don’t expect the rate of outages to remain this high over the next few months. I’ll keep monitoring it, and keep reporting on how it’s going.