8 October 2008

HTML 5, headings and sections

Tonight there was a presentation at the Web Standards Group by Lachlan Hunt about some of the new facilities provided by new and upcoming web standards: HTML 5 and CSS 3. One point that proved interesting was his coverage of the sectioning feature of HTML 5.

Whereas HTML 4 had just six levels of headings for the entire document, the working draft for HTML 5 stipulates that each section has its own heading hierarchy. An h1 element that appears at the top level in a document is considered to “rank higher than” an h1 element found in a section or article within the document.

For example, rather than using <h1>, <h2> and <h3> elements for the headings in the sample shown below, you can use three nested <section> tags, each with its own <h1>.

Diagram showing HTML 5 markup for sections and headings
HTML 5 section and heading example

This might not seem much simpler in this basic example. In fact, to me it seems decidedly less simple. In the case where each section has its own distinct hierarchy of headings, the situation becomes even more confusing. However, I think the change makes a bit more sense if you consider it in light of two things.

First, the spec recommends keeping the heading hierarchy sane by using either <h1> tags throughout the document or keeping the headings in sync with the levels of sectioning. This latter case is similar to how you do it currently, just without the <section> tags.

Sections may contain headers of any rank, but authors are strongly encouraged to either use only h1 elements, or to use elements of the appropriate rank for the section’s nesting level.

Second, one of the main reasons why sections and other elements are allowed to contain their own heading hierarchy is to handle parts of a document included from elsewhere. There are many examples of this on the web: blogs, where a few articles appear each with its own heading structure; news sites, made up of sections each which comes from its own page with its own headings; search engines, which display excerpts from other sites.

The point of the improvement is so that these sites that include other content don’t have to do any special processing to embed an external article or section with its own heading structure. The levels are automatically adjusted by the browser to account for the fact that these headings are relevant only within one subsection of the page.

So given these considerations, is it still worth the extra complexity of allowing six heading levels in every section within a document? I’m not sure. It does add a lot of complexity. In just a few minutes at the WSG meeting, we came up with a number of significant problems:

  • Search engine optimisation, or SEO, relies on extracting the heading information from the page. Rather than simply matching <h1>...</h1>, search engines now need to follow the fairly complicated process to determine the ranking of headings within the document.

  • Styling headings with CSS, particularly providing default styles, becomes much more verbose. Rather than using h1, h2, h3 { ... }, with HTML 5 you would need to define h1, section h1, section section h1 { ... }. This would probably be in addition to the old rules, if you’re including content with nested headings.

  • Automatically determining a table of content is a lot more complex. As linked above, you need to follow a fairly tricky algorithm to determine the heading structure of a document.

  • With current DOM APIs, you can easily find all headings at a particular level in the document with document.getElementsByTagName('h2'). You’d need to use a query selector to do this with the new-style use-a-h1-for-everything structure. Without an efficient query selector, this is much trickier. Even with a query selector, it’s going to probably be a fair bit slower, which is a problem if you’re doing it often.

Given these issues, I don’t consider it beneficial to make headings relative to the section that contains them. What authors gain by not having to adapt included content on the server side so it uses appropriate heading levels, they end up potentially losing due to the increased complexity in determining the outline of a document and styling headings consistently in different sections.

Perhaps if there is some benefit other than just for included content as I’ve mentioned above, a compromise solution might be to have better HTML APIs which allow access to sections and headings in a more meaningful way than the existing DOM methods like getElementByTagName. I could imagine methods like HTMLElement.getSections() and HTMLElement.getHeadings() proving useful in addressing some of the concerns above.