Hidden truth behind the Captcha

May 8, 2017

We all have come across these boxes with distorted letters and asked upon to retype those displayed letters in a text box. These tests are called CAPTCHA which stands for Completely Automated Public Turing test to tell Computers and Humans Apart. In artificial intelligence, Turing test is a test for intelligence in a computer. This test is done to check if a human can distinguish a computer from a human using the replies to questions put to both.

How it all started

CAPTCHA were created in the early days of internet to prevent the access of web content by automated pieces of codes. These code could damage the integrity of the website. The whole point of these tests is to prevent SPAM on the internet. It also stops automated programs from making online purchases like all the tickets of a well awaited concert or an event to prevent the sale of the tickets in black market.

CAPTCHA has become a universal tool and an accepted part of the internet user experience. Millions of people on the internet input these CAPTCHA on a daily basis. Some companies in developing countries have taken this phenomenon as a business opportunity. These companies basically let low wage workers solve these CAPTCHA on behalf of others. They may be people such as the visually impaired or to the companies interested in advertisement.

Evolution to reCAPTCHA

Millions of people were voluntarily translating nonsensical images into text. This seemed to the creators of CAPTCHA as waste of free labor. This led to the birth of reCAPTCHA which utilized this man power to solve words rather than random letters. The creators decided to digitize all books by scanning the pages of the book. It led to the utilization of optical character recognition software to translate the words to digital text. Any word that was not correctly recognized by the software was uploaded as reCAPTCHA.

Any reCAPTCHA that was typed as the same word for the text image shown multiple times, the word would be confirmed as the text and uploaded to the eBook database. Due to this effort hundred million reCAPTCHAs were solved per day. This rate led to digitization of two and a half million books every year. Google then acquired reCAPTCHA in 2009 to digitize all their Google books. Also they used it to digitize street sign from Google street view to label Google maps.

Birth of noCAPTCHA reCAPTCHA

As the advancement in image recognition grew, CAPTCHA were no more a barrier for bot. As CAPTCHA were made more and more difficult, a 2014 Google analysis found that artificial intelligence could crack even the most complex CAPTCHA and reCAPTCHA images with 99.8 percent accuracy where as the humans could not even solve a CAPTCHA with 50 percent accuracy. Google then launched the now familiar “I’m not a robot” checkbox know as the noCAPTCHA reCAPTCHA. While it may appear to the user as a simple task, the program in the background analyzes things like the IP addresses, cookies of the user’s web activities and event the user’s mouse movement as it hovers and approaches a checkbox.

This method can pretty much distinguish a human from a bot. If the program is still not sure, a set of challenges are presented to the uses which are based on multiple image recognition problems.Google will then remember the next time the user checks a noCAPTCHA reCAPTCHA. ‘noCAPTCHA reCAPTCHA’ means no frustration, let people go where they are going faster and keep the bots at bay.