2 May 2009

Smart Quotes: library for curly quotes in Java

Recently, I’ve become a fan of using proper punctuation in web pages. This includes simply expanding the range of punctuation I use to include a variety of dashes, fractions and symbols, and also the use of correctly curled quotation marks and apostrophes.

Even for the enthusiastic, entering the proper quotation marks manually is a pain. On Mac OS X, you need to hold down the option key and remember which of the unmarked bracket or brace keys corresponds to your desired quotation mark. On Windows, it’s even more painful to locate the quote with the character map or keyboard shortcut.

To solve this problem, I’ve written a new Java library, Smart Quotes, to automatically correct "straight quotes" to “curly quotes” in Java web applications like Confluence.

Even now, a Google search for “Java smart quotes” or “Java curly quotes” returns lots of results about how to remove the quotes rather than add them! While curly quote removal is a very simple search-and-replace, inserting correct curly quotes is a bit more of a hairy problem. You need to correctly curl the double quotes the right way, and recognise the distinction between apostrophes and single-quoted phrases.

Smart Quotes processes HTML documents. It replaces straight double or single quotes only in the text sections, avoiding the tag attributes and preformatted blocks like <code> and <pre>.

For example, the following markup:

"Occasionally, quotations may contain 'internal <em>quoted passages</em>', which shouldn't confuse one's quote-curling library."

is correctly converted by Smart Quotes into this HTML:

“Occasionally, quotations may contain ‘internal quoted passages’, which shouldn’t confuse one’s quote-curling library.”

This would work the same, even if the content were sprinkled with HTML tags.

The Smart Quotes source code contains a large number of tests that verify the quoting behaviour, interaction with tags, and so on. If you use the library and find bugs, please consider submitting a patch with a test case.

You may remember that when I started using Markdown on my blog, I also added SmartyPants. SmartyPants is a Perl library which does the same job as my new library, and I’m indebted to it for a lot of assistance with the algorithm and the original inspiration for the project. While SmartyPants already solves this problem very nicely in Perl, it isn’t suitable for integration into a cross-platform Java application.

To integrate Smart Quotes usefully in Confluence, I’ve written a Smart Quotes macro plugin. This plugin takes any other Confluence wiki content as its body, and automatically replaces the quotes throughout. For this reason, you can just wrap the entire contents of the page in a {smart-quotes} macro to get proper curly quotes.

I’m hoping the Smart Quotes library proves useful to the many Java web application developers who would like proper punctuation in their web pages. It’s available under an Apache open source license, so you can reuse and redistribute it under very flexible terms.