Wednesday, April 21, 2010

CAPTCHAs are less secure than you think

captcha-math Quick! What does CAPTCHA stand for (without consulting Wikipedia)? Completely Automated Public Turing…and then things get foggy. Try again. Completely Automated Public Turing test to tell Computers and Humans Apart. Yes, they cheated a little with the acronym (it should be CAPTTTCHA), but regardless of what it stands for, we all know what a CAPTCHA looks like. Those little, often hard to decipher, images on a form designed to only be readable by humans so as to prevent bots from submitting forms.

But are those images really unreadable by computers?

I was prompted to consider this after reviewing a blog post by one of the Telerik ASP.NET AJAX Team Leads about RadCaptcha and OCR software. The nature of the blog post was not meant to be a definitive test of RadCaptcha’s security, but it was clear through the comments that people are interested in that information. In the blog post, it is demonstrated that RadCaptcha cannot be understood by off-the-shelf OCR software, which is a fun test, but clearly not a definitive measure of security. In the “real world,” algorithms are custom tailored for attacking CAPTCHAs.

Weak CAPTCHAs

As it turns out, for years now, computers have been able to automatically parse many of the CAPTCHA images you find on the web. Research done by various universities, such as Simon Fraser University and UC Berkeley, has produced algorithms capable of “seeing” almost all CAPTCHAs that rely on simple text transformation and “busy backgrounds” to stump computer character recognition. Take the following CAPTCHA image examples:

captcha-18 captcha-71

Each of these CAPTCHAs can be successfully read by algorithms developed by university researchers with a 92% success rate! I’m sure you’ve seen a few forums or web forms that have CAPTCHAs that look a lot like this. (You may be the owner of one of those sites today.)

In short: CAPTCHAs that rely on background noise, color, and text distortion are generally ineffective at stopping modern CAPTCHA bots.

Stronger CAPTCHAs

If you must use a visual CAPTCHA today, the much more effective technique for spoiling bots is visual segmentation. When a CAPTCHA bot tries to understand an image, it will first try to remove all background noise and “obviously” unnecessary image data (think about what you’d do in Photoshop to clean-up an old damaged photo – that’s what the bots do). Then, bots will go about trying to “segment” the remaining data in to areas that can be eventually parsed in to characters. With sufficient overlapping and thick cross-cutting lines, bots are unable to accurately translate segments in to characters. For example, the following image is much more effective at blocking bots than the previous images even though there is no background noise or color:

captcha-good

Of course, even strong CAPTCHA images like this are being broken with advanced research, but in the current cat and mouse chase of CAPTCHA security, this is among the best options for balancing human readability and computer bot blocking.

Optimizing RadCaptcha

Given this new understanding about the current state of CAPTCHA bots, there are some easy takeaways for configuring RadCaptcha for maximum bot blocking:

  1. Don’t rely on visual CAPTCHA protection only
    Bots often give away their identity by trying to submit forms too quickly or by trying to submit a form too many times. Take advantage of RadCaptcha’s non-visual protections to maximize bot prevention.
  2. Maximize Line Noise Level, Eliminate Background Noise Level
    Research says background noise is first thing a CAPTCHA bot throws-out, so it offers little value to your image. Instead, maximize your CAPTCHA image line noise and font warp factor to make segmentation hard for bots. Set properly, RadCaptcha can produce very secure CAPTCHA images like this:

    captcha-rad
  3. Use a Custom Character Set
    Many bots rely on encountering a predictable set of characters or words to accurately parse a website’s CAPTCHA image. By using a custom character set with RadCaptcha that includes non-alphanumeric characters (like @, !, #, $), you can increase your odds of beating the bots.

No visual CAPTCHA image is perfect, and with the modern trend of employing humans to beat CAPTCHAs, a CAPTCHA is a road bump at best. Still, they prevent the casual spam bot from infiltrating your site and protect your forms from the script kiddies.

Telerik will continue to add improved security features to RadCaptcha in future releases, but by following these simple guidelines, you can confidently get the most value out of a CAPTCHA today that a CAPTCHA can provide.

0 comments: