14 August 2008

10 things every web developer should know

Now that web development is well into its second decade as a profession, the field has changed enormously. Savvy web developers are more keen on standards-compliance, multiple browser support, use of libraries, and other techniques that make you website shine in more situations than ever before.

Here my list of the top ten things every developer who is writing a web page absolutely must know. They apply for intranet work as well as public web development, and whether you’re writing a static site or a dynamic web application. They make up part of our recruitment criteria at Atlassian, and for a very good reason: if you don’t know this stuff, you’re going to write crappy web applications.

1. You need to test in multiple browsers

The browser wars are back again. While Internet Explorer still sends the majority of requests on the public web each day, Firefox and Safari usage is growing. The number of Macs is increasing, and technical users are recommended Firefox to their relatives and friends as a security measure.

If you want your site to work properly in different browsers, you need to test it in different browsers. Web standards aren’t yet implemented thoroughly enough that you can write a web page once and expect it to work well everywhere.

In my experience there are two kinds of laziness that leads web developers to avoid testing their site in multiple browsers:

  • Web developers working on an internal site where they assume a standard browser will always be used
  • Web developers who write a site, test it with their favourite browser, and expect everyone else to use that browser if they want to see the site.

For former people, you need to think carefully about this decision. Are you sure noone will use your site with a different browser? What about when you or your boss gets an iPhone or Blackberry and wants to use that to browse the site? What about those people in the design team that all use Macs? What about when your company upgrades the browser to a new version?

For the latter folks, you just need to wise up. Noone is going to change from Internet Explorer to Firefox just to look at your site. It’s not that important. Get over yourself, already.

2. There is an important difference between GET and POST

Many people assume the only difference between GET and POST is that parameters are passed in the query string with a GET and in the body of a request with a POST. Because of this assumption, decisions are made about which to use for a particular form based on whether the user should be able to see or “fiddle with” the request parameters.

The real difference between GET and POST comes from the HTTP specification, where it says this:

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”. This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

In English, this means a POST is allowed to change something on the server but a GET should not. That is, you should never use a GET — a request with query string parameters — to make changes to data on the server.

Additionally, you should know that using POST is not any kind of security measure. Using POST makes modifying the values of a form only a very tiny bit harder. Browser extensions make it trivial to change all POSTs in a page to GETs, or modify the form in any way before it gets submitted. Validate your requests on the server before processing them.

This guideline could also be rephrased as know and understand HTTP. HTTP is the protocol on which the web is built, and you should understand it in intimite detail if you are developing websites.

3. IDs must be unique in a document

In addition to tag names, the two attributes you can put on almost every HTML element are ‘id’ and ‘class’. They’re the most common way to select one or more tags for styling purposes and for attaching dynamic functionality with JavaScript.

What isn’t obvious to many web developers though, is that the ‘id’ attribute is like a primary key in a database. You’re only allowed to have one element with a given ID per page.

The foundation of dynamic functionality on the web, the DOM API, makes this plainly obvious with the most important method for accessing elements on a page (emphasis added):

getElementById

Returns the Element whose id is given by elementId. If no such element exists, returns null. Behavior is not defined if more than one element has this id.

Running a validator over your page is the best way to pick up this problem. It will quickly tell you if you are using the same ID multiple times in the same page.

4. Your CSS and JavaScript should be outside your HTML

These days, almost everyone knows about separating content and presentation. Do you put formatting in CSS files? I’m sure you do. Well, you try.

The next step in flexible and maintainable web development goes beyond style and content by separating out the dynamic functionality as well. The recommendation is to aim for unobtrusive JavaScript: script that is separate to the content and enhances rather than replacing it.

Rather than scattering JavaScript throughout your HTML code like this:

<a href="#" onclick="window.open('page.html')">my page</a> <!-- ick! -->

The modern JavaScript technique is to add this dynamic functionality to your page in a way that allows the script to be separate and optional. Your HTML is clean:

<a href="page.html" class="popup">my page</a> <!-- clean --> 

And your external JavaScript file does all the heavy lifting:

jQuery(function ($) {
  $("a.popup").click(function () {
    window.open(this.href);
    return false;
  });
});

There are quite a few reasons why you would do this. There are several good documents which go into more detail on the benefits.

5. Use heading tags for headings

It seems very common practice for websites use a <div> or <span> for a heading and style it the way they want using CSS. Frankly, I’m baffled by this approach. HTML contains no fewer than six heading tags, <h1> through <h6>, and you can customise them with CSS to look however you like.

Use the heading tags in your HTML. Change their styles with your stylesheets. Don’t be afraid to remove their default formatting completely and replace it with something that matches your styles.

Not only are there benefits in terms of meaningful markup, accessibility and tool support, use of heading tags is highly recommended for search engine optimisation, so it can be important for your business too.

6. Table tags are for tables, not for layout

Back in 1996, HTML 3 gave the web tables. For the next five years or so, tables were the preferred way of laying out web pages. Your entire page would be structured as nested tables: the navigation bar on the left was a table cell, so was your content area on the right. With the improved support for cascading stylesheets in HTML 4 at the end of 1999, tables for layout were superceded by this much more powerful tool.

Now we’re in 2008, and there’s absolutely no reason why web pages must be designed using tables for layout. CSS techniques are available for the most complex grid designs, so your markup can be almost completely independent of its on-screen presentation.

Tables still have their uses, of course, and displaying tabular data is the primary one. All web developers should understand how to mark up tables properly, including using <thead>, <tbody> and <tfoot> for grouping rows, and the attributes available for use in complex tables (axis, headers, scope, and so on).

7. With meaningful markup, almost everything is a list

As discussed above in the headings guideline, the current best practice in web development is to use meaningful markup for your pages. That means putting paragraphs inside <p> tags, headings inside heading tags, and so on. Pretty obvious, no?

One of the less obvious types of meaningful markup is putting elements that are grouped together logically in a list into a list tag—either an <ol> for items with a natural order, a <ul> for unordered items, or a <dl> if each item consists of a name and a value. This means a number of very common structures on web pages should be placed in a list if you’re marking up your content with meaningful tags.

Here are a few examples of structures on web pages that should probably be marked up as a list:

  • navigation links in a menu or horizontal bar (probably an unordered list, <ul>)
  • the articles on a blog (probably <ol>)
  • the comments on an article (<ol>)
  • breadcrumb links (<ol>)
  • random photos on the side of a blog (<ul>)
  • search results (<ol>)
  • links or buttons to perform operations (<ul>)
  • dialogue (<dl>).

In my experience, almost every block of content in a web page that isn’t a heading, a paragraph or a table is actually a list. Some other small bits of content fall into the several categories of Microformats—dates, authors, other metadata. There’s not very much left in a typical HTML document once you take all that away.

In order to mark content up as lists, you need to know how to override the browser default styles which assign bullet points to items in a <ul> and numbers to items in an <ol>. The relevant CSS property is list-style-type and is supported in all browsers.

Another common requirement is to display list items all on one line instead of vertically down the page. This can be achieved by styling the list items as float: left and setting appropriate margins. (Floated elements have slightly different positioning in Internet Explorer, so see guideline #1.)

8. Use a library to work around JavaScript incompatibilities in browsers

JavaScript libraries providing a variety of functionality have been around for a few years now. There’s the venerable Prototype, widget libraries like YUI and ExtJS, and more lightweight libraries like jQuery, Dojo and MooTools.

Regardless of which library you use, the most important benefit of writing JavaScript with a library today is that you don’t need to worry about browser compatibility.

Want to add an on-click event handler to a particular button? Do it with the relevant function in your library and you’re guaranteed it will work in Internet Explorer, Firefox, Opera and Safari. The same goes for hiding and showing elements on the page, AJAX requests, and most of the other common tasks in dynamic web pages.

All the JS libraries these days have comprehensive documentation, cover core dynamic functionality like DOM traversal and manipulation, and work with all browsers from IE6 upwards. Pick one and use it always.

9. Use conditional comments to work around CSS incompatibilities in browsers

Despite there being several workable specifications for cascading stylesheets published by the W3C, CSS compatibility across browser is held back significantly by Internet Explorer 6 and 7, which support none of the standards.

Going back to guideline #1, the only way to be sure your layout works in all the browsers is to test it. But once you’ve tested it and found problems in IE, how do you fix them without breaking the layout in the browsers that do respect the standards?

Here, in fact, Internet Explorer comes to the rescue with a proprietary extension called conditional comments. An example of a conditional comment looks like this:

<!--[if IE 6]>
Special instructions for IE 6 here
<![endif]-->

To every browser other than Internet Explorer 6, this looks like a normal HTML comment. Internet Explorer 6, however, disregards the first and third lines and processes the second line like it was part of the page.

This helps immensely with providing a custom stylesheet for Internet Explorer, because you can simply put this in your header:

<link type="text/css" rel="stylesheet" href="/styles/site.css"/>
<!--[if IE]>
<link type="text/css" rel="stylesheet" href="/styles/site-ie.css"/>
<![endif]-->

The site.css file is loaded by all browsers and the site-ie.css file is only loaded by Internet Explorer. By using the version numbers in the conditional comment, like shown in the first example, you can provide custom stylesheets for specific versions or a range of versions of IE.

Inside your IE-specific stylesheet, you now have the power to make changes which override styles in your default stylesheet. This lets you make the necessary tweaks to provide a consistent look and feel across all the browsers with CSS layouts.

10. Unicode is the only encoding you should ever use

Since its release in the early 1990s, Unicode has become the dominant standard in localisation and internationalisation. Not only does it cover all major writing systems used around the world, but it does so in a way that allows multilingual documents with as many different scripts as you like.

At its heart, Unicode is a really simple project. They decided to go through each character (or part of a character, if it can be broken down) in every writing system, and give it a name and a number. A unique name and a number between one and a million. Or one and a billion. It didn’t really matter, as long as every character had a number. This part of the project is technically called the Universal Character Set, and the number for a particular character is called its code point.

The other key part of Unicode is the definition of two widely-used ways to encode these numbers, or code points, into a series of bytes so you can store them in a file or send them over the network. They’re called UTF-8 and UTF-16. You can use the former if you’re mostly dealing with Latin text and it will save you space, you can use the latter if you’re mostly dealing with non-Latin text.

What about all those other encodings? ISO-8859-something? Shift-JI-something? Forget them. They don’t support multi-lingual documents, they don’t have thorough encoding of all the characters you might want to use, and they’re just a pain to work with. Unicode also defines standard rules for sorting characters and normalising them for comparison, which these other encodings don’t do.

My recommendation for English-speaking developers is to default everything to UTF-8. Some places you need to remember to do this:

  • For all documents served over the web, use a Content-Type header with a charset parameter:

    Content-Type: text/html; charset=utf-8

  • For files in your source code, ensure your editor and operating system creates new files as UTF-8 by default and saves them with this encoding

  • Set your database default encoding to UTF-8 and ensure all your existing databases, tables and columns are set to UTF-8

  • For HTML documents which might be saved to disk, include the Content-Type header in a meta tag in the document:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  • For CSS or JavaScript that might be saved to disk, include the charset attribute in your <script> and <link> tags

  • In Java application servers, configure the URL parameter decoding to be UTF-8. This fits with the default behaviour of most browsers being to URL encode parameters using the character set of the current page, which you’ve already set to UTF-8.

  • If your web site saves files to disk on the server, ensure your operating system is configured with Unicode filenames. Which encoding it uses internally isn’t important.

  • When you’re dealing with text in your application, remember to use methods that correctly handle scripts other than ASCII. For example, don’t write regular expressions like /[a-z]/ to match a lower-case letter. Use /\p{LowercaseLetter}/ or /\p{Ll}/ instead.

Finally, remember to test your site with characters from different scripts. It’s not difficult at all to grab some sample text from one of the foreign language Wikipedia sites and drop it onto your pages. If you’ve followed my advice and stuck with UTF-8 throughout your stack, it won’t be hard to track down any inconsistencies.

Conclusion

My main inspiration for the ten guidelines were some patterns I noticed where many otherwise savvy developers had unexpected gaps in their knowledge. It is also hard to keep up with a changing field like web development, and I wanted to share some of the knowledge I’ve acquired recently. I’d recommend reading further on all the topics listed above if you want to be developing high quality web applications.