18 May 2006


The neverending battle between spammers and everyone else continues, and I've just discovered a recent development.

In years past, we've tried to battle comment spam with CAPTCHA. The spammers then fought back, developing OCR bots that are better at reading CAPTCHAs than most people.

However, email spam protection looked like a battle won: Bayesian filters worked wonders against text in emails by pattern matching against known spam messages. It was a simple strategy, and it worked incredibly well. Until now.

Taking advantage of how most graphical email clients will display attached embedded images (since there's no information leak in displaying them), spam messages are now attach the spam in an image with an innocent and unrelated body of text.

Of course, images with text could also be checked by a Bayesian filter with OCR software, so the spammers have implemented the same techniques that we developed to stop them: they generate their spam messages with swirls and background images designed to prevent OCR. It's spam via captcha: spamtcha.

I don't know whether a Bayesian filter with OCR actually exists, but the spammers are taking no chances. What can the spam-blocking software do to combat this?